Hi John- It looks like we found the issue. We have concurrent queries that hit the data and we weren’t building the indexes offline. This introduced a condition where two queries would build the indexes at the same time and step on each other. We are generating the indexes offline now, and since doing so we haven’t seen any further issues.
Sorry for the false alarm =) Thanks! Sean On Jul 1, 2014, at 11:08 PM, Sean McNamara <[email protected]> wrote: > John- > > Sounds great, I will try this first thing in the AM. > > Thanks again! > > Sean > > On Jul 1, 2014, at 10:47 PM, K. John Wu <[email protected]> wrote: > >> Hi, Sean, >> >> On the face of it, it appears that you are running out of memory. You >> can tell FastBit file manager that there is more memory to use by calling >> >> ibis::fileManager::adjustCacheSize >> >> with something larger before you run that query. >> >> It is also likely that FastBit is neglecting to free some indexing >> data structure since you are going through a lot of data partitions. >> >> The delay is intended to see if another thread has released anything. >> Looks like you might be running with a single thread, so that waiting >> is not doing anything useful. Will see if there is an easy way to >> detect this situation and disable the waiting thing.. >> >> John >> >> >> On 7/1/14, 5:19 PM, Sean McNamara wrote: >>> Hi John- >>> >>> We are chasing down a small issue that sporadically occurs within our data >>> partitions and was curious if you might have any insight (we are using >>> 1.3.9). >>> >>> >>> We generate data partitions offline and periodically download new >>> partitions into a temp folder, and then issue a ‘mv’ command into our data >>> dir. What sometimes occurs is that a newly downloaded partition will have >>> an error like this: >>> >>> >>> Warning -- column[Column index metadata_JvC8R1.sample](LONG)::estimateRange >>> -- received a std::exception -- bitvector::decompress failed to allocate >>> array to uncompressed bits >>> Tue Jul 1 23:51:14 2014 >>> Warning -- column[Column index metadata_JvC8R1.sample](LONG)::estimateRange >>> -- received a string exception -- or_c2 internal error >>> Tue Jul 1 23:51:15 2014 >>> Warning -- column[Column index metadata_JvC8R1.sample](LONG)::estimateRange >>> -- received a std::exception -- bitvector::decompress failed to allocate >>> array to uncompressed bits >>> Tue Jul 1 23:51:16 2014 >>> Warning -- column[Column index metadata_JvC8R1.sample](LONG)::estimateRange >>> -- received a std::exception -- bitvector::decompress failed to allocate >>> array to uncompressed bits >>> >>> >>> Each warning causes the fastbit library to block for 1 second. So if there >>> are 30 partitions with this issue it would take 30 seconds to run. We are >>> able to ‘fix' the partition by deleting and re-downloading the partition. >>> We don’t have to re-generate the partition, so it seems like the issue is >>> not how we generate the partition. It seems to be how it’s being copied in >>> or if fastbit does something on the first run. Also, we only ever see >>> this issue with INT and LONG fields for some reason. >>> >>> >>> We’re working to deterministically reproduce the issue and will let you >>> know our findings. Let me know if you have any further insight into what >>> we’re seeing. >>> >>> Thanks! >>> >>> Sean >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>> >> _______________________________________________ >> FastBit-users mailing list >> [email protected] >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
