Hi, Andrew,

Here is a tiny example trying to reproduce your test. It seems to produce the expected results. Could you please take a quick look to see if it matches your use case?

Thanks.

John


On 9/19/11 3:17 PM, Olson, Andrew wrote:
Hi John,
The pos column has the large values, "CT" is UBYTE.  So there is a problem with 
column order as well, maybe fixing that will fix the orderby?
Andrew

On Sep 19, 2011, at 5:30 PM, K. John Wu wrote:

Hi, Andrew,

As far as I can tell, you are using the API correctly.

In the print out, the first column should be "pos" and second column "CT".  
Therefore, the first print out is already ordered by pos and the orderby function has nothing to do.

If you are actually expecting the second column in the print out to be "pos," 
then it is possible other part of the FastBit code is not behaving as they should..

John


On 9/16/11 10:56 AM, Olson, Andrew wrote:
Hi John,
I noticed that the orderby isn't actually sorting the table::select results.  
Below is a snippet of my code:

   ibis::table *res = tbl->select("pos,CT",qcnd);
   res->dump(std::cout, "\t");
   res->orderby("pos");
   std::cout<<   "nRows ="<<   res->nRows()<<   std::endl;
   res->dump(std::cout, "\t");

And this is the sample output:
2       107013068
5       107013059
6       107013070
nRows = 3
binsize = 0
2       107013068
5       107013059
6       107013070

Am I using the API properly?  The data is stored across multiple partitions.

Andrew

On Aug 31, 2011, at 5:01 PM, K. John Wu wrote:

Hi, Andrew,

Currently, there is no logic in the code to recognize that the two parts of the 
data are sorted already.  Therefore, the orderby call will use a generic 
sorting procedure.

We are contemplating revamping the groupby and orderby operations, we will keep 
this in mind when we redesign things.

In the mean time, hopefully, the sorting cost is not too much and response time 
of the additional orderby is tolerable..

John



On 8/31/11 1:41 PM, Olson, Andrew wrote:
Hi,
I have a data set that is split into two partitions (A and B).  Each partition 
has columns position and score.  The -part.txt file has a metaTags entry 
corresponding to the partition (A or B).  Usually I query only one partition at 
a time, but I would like to query across both partitions by pointing to the 
parent directory.

tbl = ibis::table::create(datadir);
res = tbl->select("position,score",qcnd);

later on I retreive the positions and scores as follows:

uint64_t ierr = res->getColumnAsUInts("position", positions);
uint64_t ierr = res->getColumnAsUInts("score", scores);

The code that follows expects the positions to be sorted (they are sorted in each 
partition - and -part.txt has "sorted = true").
I suspect I need to do this before populating positions[] and scores[]:
sorted_results = res->orderby("position");

So my question is, since each partition is already sorted by position, is there 
logic in place that uses this info to do the orderby more quickly?  On a 
related note, if I add the orderby() to my code, will it slow down when 
querying only one partition (already sorted by position)?

Thanks,
Andrew
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users


Attachment: ao1109.tar.gz
Description: GNU Zip compressed data

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to