Hi, Steve, One of the users have previous suggested a different approach to the problem you are experiencing. The suggested solution was to call ibis::table::mergeCategories before any queries involving group by of categorical values. Would you mind take a look to see if that option works for you.
In the mean time, let me see if I can combine your code with the existing option. John On 10/16/15 10:36 PM, Enns, Steven wrote: > Hi John, > > We noticed that a simple group by query took disproportionately long > on multiple partitions compared to a single partition. The profiler > indicates that the bottleneck is in converting the original column > from ids to strings (lots of string allocs), and then the group by > operations (sort, reduce) are done on strings instead of the category > ids. The reason for the string conversion seems to be that the ids > aren’t consistent across the partitions. Instead I propose re-mapping > the ids into a shared dictionary in ibis::bord::column::append. The > diff is attached, we observe about 5x-10x speedup depending on the > number of columns in the group-by. > > Steve > > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
