Re: [Neo4j] Lucene index commit rate and NoSuchElementException

Tobias Ivarsson Tue, 01 Feb 2011 13:51:08 -0800

That is correct, the Isolation of ACID says that data isn't visible to other
threads until after commit.


The CHM should not replace the index check though, since you want to limit
the number of items in the CHM, you only want this to reflect the elements
currently being worked on, the index check should still be there for
elements processed before.

-tobias

On Tue, Feb 1, 2011 at 10:25 PM, Michael Hunger <
michael.hun...@neotechnology.com> wrote:

> What about batch insertion of the nodes and indexing them after the fact?
>
> And I agree with Tobias that a CHM should be a better claim checking
> algorithm than using
> indexing for that. The index as well as the insertion of the nodes will
> only be visible to other
> threads after the commit (ACID, please TI correct me if I'm wrong) , so it
> is surely possible that
> you accidentally insert the same data twice.
>
> Cheers
>
> Michael
>
> Am 01.02.2011 um 22:19 schrieb Tobias Ivarsson:
>
> > No, it means that you have to synchronize the threads so that they don't
> > insert the same data concurrently.
> > Perhaps a ConcurrentHashMap<MD5,token> where you would
> putIfAbsent(md5,new
> > Object()) when you start working on a new hash. If the token Object you
> get
> > back is not the same as you put in, you know that another thread is
> working
> > on that md5, which means this thread should move on to another one. When
> the
> > transaction is done you remove the md5 from the Map, to ensure that you
> > don't leak memory.
> >
> > That's a simple "locking on arbitrary key" implementation. The reason you
> > cannot just do synchronized(md5) {...} is of course that your hashes are
> > computed, and thus will not be the same object every time, even though
> they
> > are equals().
> >
> > For getting a performance boost out of writes, doing multiple operations
> in
> > one transaction will give a much bigger gain than multiple threads
> though.
> > For your use case, I think two writer threads and a few hundred elements
> per
> > transaction is an appropriate size.
> >
> > -tobias
> >
> > On Tue, Feb 1, 2011 at 9:06 PM, Massimo Lusetti <mluse...@gmail.com>
> wrote:
> >
> >> On Tue, Feb 1, 2011 at 8:02 PM, Mattias Persson
> >> <matt...@neotechnology.com> wrote:
> >>
> >>> Seems a little weird, the commit rate won't affect the end result,
> >>> just performance (more operations per commit means faster
> >>> performance). Your code seems correct for single threaded use btw.
> >>
> >> Does it means that I cannot access the graphdb from multiple threads?
> >> That code is on a singleton service which expose the
> >> GraphDatabaseService through a method addNode() from where I run that
> >> code.
> >>
> >> The singleton service is called by a thread pool which can fire at
> >> maximum 20 concurrent threads.
> >>
> >> Any hints is really appreciated.
> >>
> >> Cheers
> >> --
> >> Massimo
> >> http://meridio.blogspot.com
> >> _______________________________________________
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> >
> >
> >
> > --
> > Tobias Ivarsson <tobias.ivars...@neotechnology.com>
> > Hacker, Neo Technology
> > www.neotechnology.com
> > Cellphone: +46 706 534857
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Tobias Ivarsson <tobias.ivars...@neotechnology.com>
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Lucene index commit rate and NoSuchElementException

Reply via email to