Hi, Steve,

One of the users have previous suggested a different approach to the
problem you are experiencing.  The suggested solution was to call
ibis::table::mergeCategories before any queries involving group by of
categorical values.  Would you mind take a look to see if that option
works for you.

In the mean time, let me see if I can combine your code with the
existing option.

John




On 10/16/15 10:36 PM, Enns, Steven wrote:
> Hi John,
> 
> We noticed that a simple group by query took disproportionately long
> on multiple partitions compared to a single partition.  The profiler
> indicates that the bottleneck is in converting the original column
> from ids to strings (lots of string allocs), and then the group by
> operations (sort, reduce) are done on strings instead of the category
> ids.  The reason for the string conversion seems to be that the ids
> aren’t consistent across the partitions.  Instead I propose re-mapping
> the ids into a shared dictionary in ibis::bord::column::append.  The
> diff is attached, we observe about 5x-10x speedup depending on the
> number of columns in the group-by.
> 
> Steve
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to