The other case I'm testing has five necessary namespaces. :( 10000: 6462ms 20000: 7592ms 30000: 8689ms 40000: 9417ms 50000: 9566ms 60000: 10368ms 70000: 10963ms 80000: 12167ms
Is there any direction you can suggest to look for a workaround? On Tue, Sep 23, 2014 at 1:43 PM, Christian Grün <christian.gr...@gmail.com> wrote: > > This namespace happens to be unnecessary, but others won't be. I'm so > > curious how this can be the thing. > > Unfortunately, the intricacies of namespaces have been keeping us XML > implementers busy for a long time, and the XPath and storage > algorithms would be much simpler, if not trivial, without the notion > of namespaces. This is why it would take quite a while to explain what > are the reasons for that, and as your input document only contains one > namespaces, I'm not surprised that you are surprised ;) To put it in a > nutshell: it's usually easy to optimize single namespaces issues, but > it's difficult to optimize all cases that happen in practice. > > But I'll keep track of your use case. > > > On Tue, Sep 23, 2014 at 1:30 PM, Gerald de Jong <ger...@delving.eu> wrote: > > > > On Tue, Sep 23, 2014 at 1:20 PM, Gerald de Jong <ger...@delving.eu> > wrote: > >> > >> WOW, really... the namespace? Because it's unused, or is it always going > >> to slow when there are namespaces? > >> > >> On Tue, Sep 23, 2014 at 1:13 PM, Christian Grün > >> <christian.gr...@gmail.com> wrote: > >>> > >>> Thanks for the document. The declaration of the (unused) namespace in > >>> the root element seems to be the cause for the decreasing performance > >>> (I noticed that the time for adding documents stays constant after > >>> removing the declaration). I'll do some profiling in order to find out > >>> if this can be sped up without too much effort (it may take a while, > >>> though, because I'll be on leave for a while from tomorrow). > >>> > >>> > >>> On Tue, Sep 23, 2014 at 12:25 PM, Gerald de Jong <ger...@delving.eu> > >>> wrote: > >>> > I don't know what causes the gradual slowdown. My assumption was > that > >>> > it > >>> > was the "optimize" which would cause the index to be built, so I > didn't > >>> > expect a slowdown at all during "add" calls, especially when > autoflush > >>> > is > >>> > false. > >>> > > >>> > I add documents with the following paths: > >>> > > >>> > /f/f/e/ffe0f5be2aa14e81050f759c8f9c3eb7.xml > >>> > > >>> > The xml file name is a hash of the contents, and it is placed in a > path > >>> > such > >>> > that the export spreads out the files nicely into a file system tree, > >>> > rather > >>> > than putting a million docs into one directory. > >>> > > >>> > The document content is nothing special, wrapped in a special tag: > >>> > > >>> > <narthex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > >>> > id="20412518" > >>> > mod="2014-09-23T11:11:51.007+02:00"> > >>> > <record> > >>> > <priref>20412518</priref> > >>> > <current_location>FTA</current_location> > >>> > <current_location.type/> > >>> > <description>Ingang op de binnenplaats van de > >>> > zuidvleugel</description> > >>> > <collection>Fotocollectie</collection> > >>> > <production.date.start>1925-08-06</production.date.start> > >>> > <reproduction.format/> > >>> > > >>> > > >>> > > <reproduction.reference>2186abf4-7108-f9b8-ffbb-902881afe836</reproduction.reference> > >>> > <creator.role>Fotograaf</creator.role> > >>> > <object_number>9.387</object_number> > >>> > <monument.label/> > >>> > <monument.zipcode/> > >>> > <monument.name>Kasteel Hoensbroek</monument.name> > >>> > <monument.record_number>284330</monument.record_number> > >>> > <reproduction.date/> > >>> > <reproduction.notes>Oude filepath: > >>> > 0009\009387.jpg</reproduction.notes> > >>> > <reproduction.type/> > >>> > <reproduction.creator/> > >>> > <rights.type>Copyright</rights.type> > >>> > <technique>Neg.zw</technique> > >>> > <creator>Scheepens, W.C.L.A.</creator> > >>> > <order_number>avh04-2008</order_number> > >>> > <input.date>2008-04-01</input.date> > >>> > <edit.date>2011-05-03</edit.date> > >>> > <edit.date>2008-04-28</edit.date> > >>> > <monument.historical_address/> > >>> > <content.subject.type value="SUBJECT" option="SUBJECT"> > >>> > <text language="0">subject</text> > >>> > <text language="1">onderwerp</text> > >>> > <text language="2">sujet</text> > >>> > <text language="3">Thema</text> > >>> > <text language="4">موضوع</text> > >>> > <text language="6">θέμα</text> > >>> > </content.subject.type> > >>> > <content.subject.type value="SUBJECT" option="SUBJECT"> > >>> > <text language="0">subject</text> > >>> > <text language="1">onderwerp</text> > >>> > <text language="2">sujet</text> > >>> > <text language="3">Thema</text> > >>> > <text language="4">موضوع</text> > >>> > <text language="6">θέμα</text> > >>> > </content.subject.type> > >>> > <content.subject>Kasteel</content.subject> > >>> > <content.subject>Binnenplaats</content.subject> > >>> > <monument.province>Limburg</monument.province> > >>> > <monument.place>Hoensbroek</monument.place> > >>> > <monument.number/> > >>> > <monument.county/> > >>> > <monument.country>Nederland</monument.country> > >>> > <monument.house_number>18</monument.house_number> > >>> > <monument.street>Klinkertstraat</monument.street> > >>> > <monument.house_number.addition/> > >>> > <monument.complex_number/> > >>> > <monument.number.x_coordinates/> > >>> > <monument.number.y_coordinates/> > >>> > <monument.geographical_keyword/> > >>> > <monument.complex_number.x_coordinates/> > >>> > <monument.complex_number.y_coordinates/> > >>> > <creator.date_of_birth/> > >>> > <creator.date_of_death/> > >>> > <input.name>a.vanhoute</input.name> > >>> > <edit.name>RCEadmin</edit.name> > >>> > <edit.name>a.vanhoute</edit.name> > >>> > <creator.history/> > >>> > <record_type value="OBJECT" option="OBJECT"> > >>> > <text language="0">single object</text> > >>> > <text language="2">objet individuel</text> > >>> > <text language="3">Einzelnes Objekt</text> > >>> > </record_type> > >>> > <edit.time>03:10:32</edit.time> > >>> > <edit.time>11:17:08</edit.time> > >>> > <input.time>09:58:28</input.time> > >>> > <input.source>document>photographs</input.source> > >>> > <edit.source>collect>photograph</edit.source> > >>> > <edit.source>document>photographs</edit.source> > >>> > </record> > >>> > </narthex> > >>> > > >>> > On Tue, Sep 23, 2014 at 11:36 AM, Christian Grün > >>> > <christian.gr...@gmail.com> > >>> > wrote: > >>> >> > >>> >> > I set up to use the 8.0-SNAPSHOT and used the internal parser as > >>> >> > well. > >>> >> > In > >>> >> > your example you're not really giving much of a challenge to the > >>> >> > index, > >>> >> > since every doc is just <a/>. > >>> >> > >>> >> If I get it right, you assume the slowdown is due to the index > >>> >> structures? > >>> >> > >>> >> > With respect to ADD, I'm not seeing a significant performance > >>> >> > difference: > >>> >> > >>> >> Please give us more info on the data you are adding. Could you > provide > >>> >> us with a sample document? > >>> >> > >>> >> > >>> >> > 8.0-SNAPSHOT > >>> >> > ------- > >>> >> > 10000: 9250ms > >>> >> > 20000: 7626ms > >>> >> > 30000: 7885ms > >>> >> > 40000: 8111ms > >>> >> > 50000: 8365ms > >>> >> > 60000: 8784ms > >>> >> > 70000: 9270ms > >>> >> > 80000: 9692ms > >>> >> > 90000: 10158ms > >>> >> > 100000: 10612ms > >>> >> > 110000: 11018ms > >>> >> > 120000: 11478ms > >>> >> > 130000: 11940ms > >>> >> > 140000: 12505ms > >>> >> > 150000: 13047ms > >>> >> > 160000: 13536ms > >>> >> > 170000: 14055ms > >>> >> > 180000: 14371ms > >>> >> > 190000: 14883ms > >>> >> > 200000: 15330ms > >>> >> > 210000: 15888ms > >>> >> > 220000: 16398ms > >>> >> > 230000: 16878ms > >>> >> > 240000: 17038ms > >>> >> > 250000: 17453ms > >>> >> > 260000: 17965ms > >>> >> > 270000: 18317ms > >>> >> > 280000: 18832ms > >>> >> > 290000: 19373ms > >>> >> > 300000: 19735ms > >>> >> > 310000: 20062ms > >>> >> > 320000: 20675ms > >>> >> > 330000: 21113ms > >>> >> > 340000: 21754ms > >>> >> > 350000: 22887ms > >>> >> > 360000: 22810ms > >>> >> > 370000: 22985ms > >>> >> > 380000: 23506ms > >>> >> > 390000: 23856ms > >>> >> > 400000: 24338ms > >>> >> > > >>> >> > 7.9 > >>> >> > ----- > >>> >> > 10000: 8229ms > >>> >> > 20000: 7587ms > >>> >> > 30000: 7973ms > >>> >> > 40000: 8282ms > >>> >> > 50000: 8717ms > >>> >> > 60000: 9294ms > >>> >> > 70000: 10105ms > >>> >> > 80000: 10669ms > >>> >> > 90000: 11301ms > >>> >> > 100000: 11835ms > >>> >> > 110000: 12413ms > >>> >> > 120000: 13000ms > >>> >> > 130000: 13577ms > >>> >> > 140000: 14331ms > >>> >> > 150000: 14488ms > >>> >> > 160000: 15025ms > >>> >> > 170000: 15463ms > >>> >> > 180000: 15815ms > >>> >> > 190000: 16153ms > >>> >> > 200000: 16314ms > >>> >> > 210000: 16562ms > >>> >> > 220000: 17186ms > >>> >> > 230000: 17862ms > >>> >> > 240000: 18340ms > >>> >> > 250000: 18790ms > >>> >> > 260000: 19313ms > >>> >> > 270000: 19850ms > >>> >> > 280000: 20225ms > >>> >> > 290000: 20650ms > >>> >> > 300000: 21062ms > >>> >> > 310000: 21595ms > >>> >> > 320000: 22022ms > >>> >> > 330000: 22414ms > >>> >> > 340000: 22925ms > >>> >> > 350000: 23514ms > >>> >> > 360000: 23762ms > >>> >> > 370000: 24360ms > >>> >> > 380000: 25028ms > >>> >> > 390000: 25446ms > >>> >> > 400000: 25700ms > >>> >> > > >>> >> > - Gerald de Jong > >>> >> > > >>> >> > > >>> >> > On Thu, Sep 18, 2014 at 6:57 PM, Christian Grün > >>> >> > <christian.gr...@gmail.com> > >>> >> > wrote: > >>> >> >> > >>> >> >> > Perhaps you can give me a hint as to why inserts slow down.j > >>> >> >> I didn't have time to check out 7.9, but I have done some testing > >>> >> >> with > >>> >> >> 8.0, and I didn't notice a real slow-down. This is Java testing > >>> >> >> script > >>> >> >> (1 mio documents are added in just 17 seconds; I'm using the > >>> >> >> internal > >>> >> >> BaseX parser to speed up the import): > >>> >> >> > >>> >> >> Performance p = new Performance(); > >>> >> >> Context ctx = new Context(); > >>> >> >> > >>> >> >> new CreateDB("db").execute(ctx); > >>> >> >> new Set(MainOptions.AUTOFLUSH, false).execute(ctx); > >>> >> >> new Set(MainOptions.INTPARSE, true).execute(ctx); > >>> >> >> for(int i = 0; i < 1000000; i++) { > >>> >> >> new Add("db", "<a/>").execute(ctx); > >>> >> >> } > >>> >> >> ctx.close(); > >>> >> >> System.out.println(p); > >>> >> >> > >>> >> >> Hope this helps, > >>> >> >> Christian > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > Delving BV, Vasteland 8, Rotterdam > >>> >> > http://www.delving.eu > >>> >> > http://twitter.com/fluxe > >>> >> > skype: beautifulcode > >>> >> > +31629339805 > >>> > > >>> > > >>> > > >>> > > >>> > -- > >>> > Delving BV, Vasteland 8, Rotterdam > >>> > http://www.delving.eu > >>> > http://twitter.com/fluxe > >>> > skype: beautifulcode > >>> > +31629339805 > >> > >> > >> > >> > >> -- > >> Delving BV, Vasteland 8, Rotterdam > >> http://www.delving.eu > >> http://twitter.com/fluxe > >> skype: beautifulcode > >> +31629339805 > > > > > > > > > > -- > > Delving BV, Vasteland 8, Rotterdam > > http://www.delving.eu > > http://twitter.com/fluxe > > skype: beautifulcode > > +31629339805 > -- Delving BV, Vasteland 8, Rotterdam http://www.delving.eu http://twitter.com/fluxe skype: beautifulcode +31629339805