Thanks Billie/Josh! That's indeed fixing the issue, the scan now returns instantly!!
So when we scan the whole table and filtering by column family, Accumulo still has to go through all rows (ordered by the key), and check if the particular item has specific column family, and in my case since they are intermingled, the data I am looking for could be somewhere in the middle or in the end of the rfile, am I right? I did another experiment, if I specify -b and -e, then it also returned instantly (this before I moved them to different group and compact), which does make sense, because Accumulo could narrow down to specific ranges, and then filter them by column family. I have another follow up question, does it mean I have to create new locality group for each column family since I wouldn't know how big/small the data belong to that cf in advance? Btw, we shard the customers by putting their id as column family, so we'll add new column family whenever new customer onboard. I think the case which we have to scan the table with cf without specifying ranges may be rare (or perhaps never, except if I run it from shell), but I am worried if this can become perf bottleneck if I don't set them to separate locality group. Another question, when running setgroups command, it looks like I have to set for all of them, even I just added new cf. For example, let say I did: setgroups mygroup=cf1,cf2 -t mytable compact -t mytable -w Then later I need to add cf3 to the same group, I have to do "setgroups mygroup=cf1,cf2,c3 -t mytable", instead of just "setgroups mygroup=cf3 -t mytable" It'd be nice if I can do the latter :-) What happens with cf1 and cf2 if I did the latter, does it mean they are coming back to default group again after compaction? Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/scan-command-hung-tp15286p15324.html Sent from the Developers mailing list archive at Nabble.com.
