Hi John,
I have been storing data in multiple partitions, using metaTags to identify the 
partitioning.  For example, this query fails because multiple partitions have 
matches:
$ thula -d . -s "FBchr,count(*)" -w "1=1"
doQuery(1=1) evaluated on T-1 produced 1 hit out of 331230 records
-- begin printing the result table --
Table (in memory) _8PVC (GROUP BY FBchr,count(*) on table SF5D42 (GROUP BY 
FBchr, COUNT(*) on table OAiQa1)) contsists of 2 columns and 1 row
FBchr   UINT (dictionary size: 0)
_1      UINT
1, 331230
-- end printing --

And this one works because the matches are all in one partition
$ thula -d . -s "FBchr,count(*)" -w "FBchr='1'"
doQuery(FBchr='1') evaluated on T-1 produced 1 hit out of 331230 records
-- begin printing the result table --
Table (in memory) UIpJq2 (GROUP BY FBchr,count(*) on table o0Lu8 (GROUP BY 
FBchr, COUNT(*) on table _qULt2)) contsists of 2 columns and 1 row
FBchr   UINT (dictionary size: 0)
_1      UINT
1, 51976
-- end printing --

I'm not sure if this happens with normal CATEGORY columns when the dictionaries 
differ between partitions.

It seems like a bug in some output functions that are using one dictionary for 
a whole column, rather than partition specific dictionaries.  Would it be 
useful to have a command line tool that merges dictionaries and updates .int 
and .idx files across a set of partitions?  This could also remove unused 
entries from each merged dictionary that don't appear in any of the partitions.
Andrew
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to