Hi, Andrew,
As far as I can tell, you are using the API correctly.
In the print out, the first column should be "pos" and second column
"CT". Therefore, the first print out is already ordered by pos and
the orderby function has nothing to do.
If you are actually expecting the second column in the print out to be
"pos," then it is possible other part of the FastBit code is not
behaving as they should..
John
On 9/16/11 10:56 AM, Olson, Andrew wrote:
> Hi John,
> I noticed that the orderby isn't actually sorting the table::select results.
> Below is a snippet of my code:
>
> ibis::table *res = tbl->select("pos,CT",qcnd);
> res->dump(std::cout, "\t");
> res->orderby("pos");
> std::cout<< "nRows = "<< res->nRows()<< std::endl;
> res->dump(std::cout, "\t");
>
> And this is the sample output:
> 2 107013068
> 5 107013059
> 6 107013070
> nRows = 3
> binsize = 0
> 2 107013068
> 5 107013059
> 6 107013070
>
> Am I using the API properly? The data is stored across multiple partitions.
>
> Andrew
>
> On Aug 31, 2011, at 5:01 PM, K. John Wu wrote:
>
>> Hi, Andrew,
>>
>> Currently, there is no logic in the code to recognize that the two parts of
>> the data are sorted already. Therefore, the orderby call will use a generic
>> sorting procedure.
>>
>> We are contemplating revamping the groupby and orderby operations, we will
>> keep this in mind when we redesign things.
>>
>> In the mean time, hopefully, the sorting cost is not too much and response
>> time of the additional orderby is tolerable..
>>
>> John
>>
>>
>>
>> On 8/31/11 1:41 PM, Olson, Andrew wrote:
>>> Hi,
>>> I have a data set that is split into two partitions (A and B). Each
>>> partition has columns position and score. The -part.txt file has a
>>> metaTags entry corresponding to the partition (A or B). Usually I query
>>> only one partition at a time, but I would like to query across both
>>> partitions by pointing to the parent directory.
>>>
>>> tbl = ibis::table::create(datadir);
>>> res = tbl->select("position,score",qcnd);
>>>
>>> later on I retreive the positions and scores as follows:
>>>
>>> uint64_t ierr = res->getColumnAsUInts("position", positions);
>>> uint64_t ierr = res->getColumnAsUInts("score", scores);
>>>
>>> The code that follows expects the positions to be sorted (they are sorted
>>> in each partition - and -part.txt has "sorted = true").
>>> I suspect I need to do this before populating positions[] and scores[]:
>>> sorted_results = res->orderby("position");
>>>
>>> So my question is, since each partition is already sorted by position, is
>>> there logic in place that uses this info to do the orderby more quickly?
>>> On a related note, if I add the orderby() to my code, will it slow down
>>> when querying only one partition (already sorted by position)?
>>>
>>> Thanks,
>>> Andrew
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users