Ian, luckily I am in the development stages of my app, so I can freely throw away my data whenever I like (it is just syslog data from some very chatty networking hardware, so there is always more to be had). I have experienced these issues on 32 and 64 bit systems, though my main system is 64 bit. I just realized that while I have rebuilt elephant several times, I have not recompiled my own code in a while. I will try that, and if it fails, I'll figure out some way to to get you access to a test case. Thanks again for all of your help with this; I hope I am contributing to the stabilization of 1.0, and not just chasing my own bugs!
-Kevin Ian Eslick wrote: > This is very odd behavior. I have a live webapp that uses all this > stuff pretty heavily. > > I did see this active cursor message earlier, but it was due to some > debugging churn in the BDB transaction handler a day or two ago. > Those issues should all be resolved now, but it's possible something > isn't working right. Since I did make some changes to the main > transaction macros and that means that everything depending on those > macros needs to be compiled - asdf should do that automatically but > you may want to force a rebuild on the new elephant tree and your app. > > Sometimes I find it useful to manually run a full, recover-fatal pass > manually just to be sure BDB itself is happy. (db_recover -f) > > Deserialization errors have become pretty rare. I suppose it is > possible that you have a database that has a value that somehow got > corrupted. Have you tried your app on a fresh database? It might be > worth doing that to see if you can reproduce the problem. (How > critical is the data you have in there now?) > > By the way: > - Do you have a 32 or 64 bit BDB/SBCL? > - Have you double checked your my-config.sexp to make sure everything > is set right? > > Can you produce a stand-alone test case I can try locally? If there > aren't too many dependencies, I'd be happy to try reproduce this > locally or login to a machine and look at it remotely. > > By the way, stack traces should be a bit more meaningful now if you > want to send me the latest full stack trace. > > Another thing you can try is tracing 'get-value' and (setf get-value) > to see all the data you are reading/writing; perhaps it's one > particular value that is causing the serializer to choke? > > Ian > > On Jan 10, 2009, at 6:27 PM, Kevin Raison wrote: > >> Unfortunately, things remain as before. Here is some additional debug >> output that I found in the *inferior-lisp* buffer of my emacs session: >> >> Deserialization error in map: returning nil for element >> transaction has active cursors >> PANIC: Invalid argument >> PANIC: DB_RUNRECOVERY: Fatal error, run database recovery >> PANIC: fatal region error detected; run recovery >> >> >> >> The deserialization error happens in one of two spots: >> - when trying to get-from-root a btree or indexed btree >> - when trying to get a pset via a call to get-value on the btree >> >> Running recovery does not help; the errors start as soon as I load >> and >> execute my code post-recovery. >> >> Ian, can you enlighten me as to where in the code you believe these >> errors to originate? Also, can you think of anything that I might be >> doing incorrectly in my code that would cause this sort of behavior? >> Here is a brief outline of what I am doing: >> >> There is >> - a waitqueue (via sbcl's sb-thread:make-waitqueue) >> - a tcp listener thread that writes its input to an incoming work >> queue >> - 3 reader threads who use the waitqueue to get work items from the >> work queue >> The readers parse the incoming data and create a persistent object >> from >> it. They also index the individual words of one of the persistent >> object's slots using a btree. These actions are contained in a >> with-transaction block. >> There is also a hunchentoot web server thread running. Users can >> search >> the indexed text via this web interface. >> >> It is really rather simple. The errors crop up randomly, sometimes >> when >> I am running searches via the web interface while the listener and >> reader threads are active, and sometimes while the listener and reader >> threads are doing their thing without any web-based reads happening. >> >> Cheers, >> Kevin >> >> >> Kevin Raison wrote: >>> Thanks, Ian. All tests pass for me as well. I am running my >>> application with the new code and will let you know how it goes. >>> >>> Thanks again! >>> Kevin >>> >>> Ian Eslick wrote: >>>> I just checked in a few more fixes a few minutes ago. I finally was >>>> able to reproduce some of this locally. Try cleaning out the test >>>> database and re-running the tests. Everything passes for me on a >>>> fresh DB and my own application is running fine too. >>>> >>>> Ian >>>> >>>> On Jan 10, 2009, at 2:04 PM, Kevin Raison wrote: >>>> >>>>> After pulling the latest patches and rebuilding without >>>>> optimizations as >>>>> you suggest, I get the following errors, the second of which is >>>>> slightly >>>>> different than what I was receiving previously. >>>>> >>>>> First, a deserialization error: >>>>> Condition ELEPHANT-TYPE-DESERIALIZATION-ERROR was signalled. >>>>> [Condition of type ELEPHANT-TYPE-DESERIALIZATION-ERROR] >>>>> >>>>> Restarts: >>>>> 0: [RETRY] Retry SLIME REPL evaluation request. >>>>> 1: [ABORT] Return to SLIME's top level. >>>>> 2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread" >>>>> RUNNING {10034F31F1}>) >>>>> >>>>> Backtrace: >>>>> 0: ((LAMBDA (SWANK-BACKEND::DEBUGGER-LOOP-FN)) #<FUNCTION >>>>> (LAMBDA #) >>>>> {10040B2389}>) >>>>> 1: (SWANK::DEBUG-IN-EMACS #<ELEPHANT-TYPE-DESERIALIZATION-ERROR >>>>> {1002CB3531}>) >>>>> 2: (SWANK:INVOKE-SLIME-DEBUGGER #<ELEPHANT-TYPE-DESERIALIZATION- >>>>> ERROR >>>>> {1002CB3531}>) >>>>> 3: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION >>>>> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {1002CB3AC9}>) >>>>> 4: (INVOKE-DEBUGGER #<ELEPHANT-TYPE-DESERIALIZATION-ERROR >>>>> {1002CB3531}>) >>>>> 5: ((FLET #:FUN23) #<ELEPHANT-TYPE-DESERIALIZATION-ERROR >>>>> {1002CB3531}>) >>>>> 6: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION >>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) ..)[:EXTERNAL] >>>>> 7: (SB-INT:SIMPLE-EVAL-IN-LEXENV (REINDEX-LOG-ENTRIES) #<NULL- >>>>> LEXENV>) >>>>> 8: (SWANK::EVAL-REGION "(reindex-log-entries)\n") >>>>> 9: ((LAMBDA ())) >>>>> 10: (SWANK::TRACK-PACKAGE #<CLOSURE (LAMBDA #) {1002C596F9}>) >>>>> 11: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation >>>>> request." #<CLOSURE (LAMBDA #) {1002C59619}>) >>>>> 12: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA #) >>>>> {1002C595E9}>) >>>>> 13: (SWANK::REPL-EVAL "(reindex-log-entries)\n") >>>>> >>>>> And then this one with any subsequent access to the bdb: >>>>> >>>>> Bad type argument: >>>>> >>>>> >>>>> BDB-DB-ERROR >>>>> [Condition of type SIMPLE-TYPE-ERROR] >>>>> >>>>> Restarts: >>>>> 0: [RETRY] Retry SLIME REPL evaluation request. >>>>> 1: [ABORT] Return to SLIME's top level. >>>>> 2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread" >>>>> RUNNING {10034F31F1}>) >>>>> >>>>> Backtrace: >>>>> 0: (MAKE-CONDITION BDB-DB-ERROR)[:EXTERNAL] >>>>> 1: (ERROR BDB-DB-ERROR)[:EXTERNAL] >>>>> 2: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION >>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) #<unused argument> #<unused >>>>> argument> >>>>> #<BDB-STORE-CONTROLLER /srv/console/db/> #<CLOSURE (LAMBDA #) >>>>> {10037C53D9$ >>>>> 3: (ADD-LOG-ENTRY)[:EXTERNAL] >>>>> 4: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ADD-LOG-ENTRY :SEVERITY 2 >>>>> :HOST-UUID "123" :TIMESTAMP ...) #<NULL-LEXENV>) >>>>> 5: (SWANK::EVAL-REGION "(add-log-entry :severity 2 :host-uuid >>>>> \"123\" >>>>> :timestamp (get-universal-time)\n :program >>>>> \"blah\" :text \"test entry\")\n") >>>>> 6: ((LAMBDA ())) >>>>> 7: (SWANK::TRACK-PACKAGE #<CLOSURE (LAMBDA #) {10037C1BF9}>) >>>>> 8: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation >>>>> request." #<CLOSURE (LAMBDA #) {1003636499}>) >>>>> 9: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA #) >>>>> {1003636469}>) >>>>> 10: (SWANK::REPL-EVAL "(add-log-entry :severity 2 :host-uuid >>>>> \"123\" >>>>> :timestamp (get-universal-time)\n :program \"blah\" >>>>> :text \"test entry\")\n") >>>>> 11: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK:LISTENER-EVAL "(add-log- >>>>> entry >>>>> :severity 2 :host-uuid \"123\" :timestamp (get-universal-time)\n >>>>> :program \"blah\" :text \"test entry\")\n"$ >>>>> 12: (SWANK::EVAL-FOR-EMACS (SWANK:LISTENER-EVAL "(add-log-entry >>>>> :severity 2 :host-uuid \"123\" :timestamp (get-universal-time)\n >>>>> :program \"blah\" :text \"test entry\")\n") "CONSO$ >>>>> 13: (SWANK::PROCESS-REQUESTS NIL) >>>>> 14: ((LAMBDA ())) >>>>> 15: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION >>>>> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {10038AD269}>) >>>>> 16: (SWANK::CALL-WITH-REDIRECTED-IO #<SWANK::CONNECTION >>>>> {10032FB281}> >>>>> #<CLOSURE (LAMBDA #) {10038AD289}>) >>>>> 17: (SWANK::CALL-WITH-CONNECTION #<SWANK::CONNECTION {10032FB281}> >>>>> #<CLOSURE (LAMBDA #) {10038AD269}>) >>>>> 18: (SWANK::HANDLE-REQUESTS #<SWANK::CONNECTION {10032FB281}> NIL) >>>>> 19: (SWANK::CALL-WITH-BINDINGS NIL #<CLOSURE (LAMBDA #) >>>>> {10034F5189}>) >>>>> 20: ((FLET SB-THREAD::WITH-MUTEX-THUNK)) >>>>> 21: ((FLET #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-MUTEX]477)) >>>>> 22: (SB-THREAD::CALL-WITH-MUTEX ..) >>>>> 23: ((LAMBDA ())) >>>>> 24: ("foreign function: call_into_lisp") >>>>> 25: ("foreign function: new_thread_trampoline") >>>>> >>>>> >>>>> Ian Eslick wrote: >>>>>> Also try pushing :elephant-without-optimize onto *features* >>>>>> prior to >>>>>> doing a force rebuild of elephant. This should give you some more >>>>>> information and rule out any optimization/declaration related >>>>>> bugs. >>>>>> >>>>>> Thank you, >>>>>> Ian >>>>>> >>>>>> >>>>>> On Jan 7, 2009, at 9:10 PM, Kevin Raison wrote: >>>>>> >>>>>>> I was finally able to recreate this in the repl (as opposed to >>>>>>> seeing it >>>>>>> in my error logs), so here is a trace: >>>>>>> >>>>>>> The slot DB-BDB::INDICES-CACHE is unbound in the object >>>>>>> #<BDB-INDEXED-BTREE oid:2>. >>>>>>> [Condition of type UNBOUND-SLOT] >>>>>>> >>>>>>> Backtrace: >>>>>>> 0: ((SB-PCL::FAST-METHOD SLOT-UNBOUND (T T T)) #<unavailable >>>>>>> argument> #<unavailable argument> #<unavailable argument> >>>>>>> #<BDB-INDEXED-BTREE oid:2> DB-BDB::INDICES-CACHE) >>>>>>> 1: (SB-PCL::SLOT-UNBOUND-INTERNAL #<BDB-INDEXED-BTREE oid:2> 2) >>>>>>> 2: ((SB-PCL::FAST-METHOD (SETF GET-VALUE) (T T >>>>>>> DB-BDB::BDB-INDEXED-BTREE)) #<unavailable lambda list>) >>>>>>> 3: ((LAMBDA (WORD)) "34766") >>>>>>> 4: (SB-IMPL::MAP1 #<CLOSURE (LAMBDA #) {1003CA4ED9}> (("" "asa" >>>>>>> "106007" "deny" "inbound" "udp" ...)) :LIST T) >>>>>>> 5: (MAPCAR #<CLOSURE (LAMBDA #) {1003CA4ED9}> ("" "asa" "106007" >>>>>>> "deny" "inbound" "udp" ...))[:EXTERNAL] >>>>>>> 6: ((SB-PCL::FAST-METHOD INDEX-LOG-ENTRY (LOG-ENTRY)) >>>>>>> #<unavailable >>>>>>> argument> #<unavailable argument> #<LOG-ENTRY oid:6373>) >>>>>>> 7: ((LAMBDA ())) >>>>>>> 8: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION >>>>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) #<unavailable argument> >>>>>>> #<unavailable >>>>>>> argument> #<unavailable argument> #<unavailable argument>) >>>>>>> [:EXTERNAL] >>>>>>> 9: (ADD-LOG-ENTRY ..) >>>>>>> 10: (SB-INT:SIMPLE-EVAL-IN-LEXENV ..) >>>>>>> >>>>>>> And here is the code that sometimes causes the issue: >>>>>>> (defmethod index-log-entry ((log-entry log-entry)) >>>>>>> (let ((*store-controller* *syslog-controller*)) >>>>>>> (let ((btree (get-from-root "log-entry-index" :sc >>>>>>> *syslog-controller*))) >>>>>>> (map-words #'(lambda (word) >>>>>>> (unless (or (member word *superfluous- >>>>>>> words* :test >>>>>>> #'string-equal) >>>>>>> (< (length word) 2)) >>>>>>> (unless (existsp word btree) >>>>>>> (setf (get-value word btree) (make- >>>>>>> pset :sc >>>>>>> *syslog-controller*))) >>>>>>> (insert-item log-entry >>>>>>> (get-value word btree)))) >>>>>>> (text log-entry))))) >>>>>>> >>>>>>> >>>>>>> If I drop and recreate the btree, everything runs smoothly (no >>>>>>> errors) >>>>>>> for a few hours and then the above error message starts showing >>>>>>> up >>>>>>> about >>>>>>> half to three quarters of the time. >>>>>>> >>>>>>> Thanks again, >>>>>>> Kevin >>>>>>> >>>>>>> >>>>>>> Kevin Raison wrote: >>>>>>>> I am seeing an intermittent error with 1.0 alpha when trying to >>>>>>>> write to >>>>>>>> an indexed btree (using BerkeleyDB 4.7 as provided by Ubuntu's >>>>>>>> package >>>>>>>> repositories): >>>>>>>> >>>>>>>> The slot DB-BDB::INDICES-CACHE is unbound in the object >>>>>>>> #<BDB-INDEXED-BTREE oid:2> >>>>>>>> >>>>>>>> Within the same thread, sometimes this happens and sometimes I >>>>>>>> am >>>>>>>> able >>>>>>>> to read and write to the btree. I am using sbcl 1.0.24 on 32 >>>>>>>> bit >>>>>>>> intel >>>>>>>> linux. Is there something obvious that might help alleviate >>>>>>>> this >>>>>>>> or >>>>>>>> should I provide more context? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Kevin >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> elephant-devel site list >>>>>>>> elephant-devel@common-lisp.net >>>>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel >>>>>>> _______________________________________________ >>>>>>> elephant-devel site list >>>>>>> elephant-devel@common-lisp.net >>>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel >>>>>> _______________________________________________ >>>>>> elephant-devel site list >>>>>> elephant-devel@common-lisp.net >>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel >>>>> _______________________________________________ >>>>> elephant-devel site list >>>>> elephant-devel@common-lisp.net >>>>> http://common-lisp.net/mailman/listinfo/elephant-devel >>>> _______________________________________________ >>>> elephant-devel site list >>>> elephant-devel@common-lisp.net >>>> http://common-lisp.net/mailman/listinfo/elephant-devel >>> _______________________________________________ >>> elephant-devel site list >>> elephant-devel@common-lisp.net >>> http://common-lisp.net/mailman/listinfo/elephant-devel >> _______________________________________________ >> elephant-devel site list >> elephant-devel@common-lisp.net >> http://common-lisp.net/mailman/listinfo/elephant-devel > > > _______________________________________________ > elephant-devel site list > elephant-devel@common-lisp.net > http://common-lisp.net/mailman/listinfo/elephant-devel _______________________________________________ elephant-devel site list elephant-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/elephant-devel