Hi John,

We noticed that a simple group by query took disproportionately long on 
multiple partitions compared to a single partition.  The profiler indicates 
that the bottleneck is in converting the original column from ids to strings 
(lots of string allocs), and then the group by operations (sort, reduce) are 
done on strings instead of the category ids.  The reason for the string 
conversion seems to be that the ids aren't consistent across the partitions.  
Instead I propose re-mapping the ids into a shared dictionary in 
ibis::bord::column::append.  The diff is attached, we observe about 5x-10x 
speedup depending on the number of columns in the group-by.

Steve

Attachment: bord.cpp.diff
Description: bord.cpp.diff

Attachment: bord.h.diff
Description: bord.h.diff

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to