Hi, Andrea.

I've just (finally) looked at applying your multi-threaded indexing patch
and it seems to be very far from vanilla DSpace 5.x code (5.5 here) and is
therefore impossible to apply without extensive edits.

This is a shame because DSpace indexing is extremely slow even on absurdly
fast hardware. Every machine we run DSpace on has fast SSD storage, 8GB+ of
RAM, 4+ CPUs, etc, yet the indexing is single threaded and linear. As the
number of items in existing DSpace instances grow this will become a huge
limiting factor in infrastructure. I think vanilla DSpace needs to be
better about this. We're currently at 55,000 items and Discovery takes 1–2
hours! I can only imagine the pain people working on DSpace instances with
hundreds of thousands of items feel. As a developer I perform full indexes
more often on various test instances and this is quite annoying. :)

Regards,

On Wed, Aug 17, 2016 at 12:06 AM Alan Orth <alan.o...@gmail.com> wrote:

> Wow, that's great, Andrea. I'm very curious to try your patches. I
> will play with them and see if I can get them to apply to our slightly
> modified DSpace 5.1 code base.
>
> Cheers,
>
> On Fri, Aug 12, 2016 at 8:53 PM, Andrea Bollini
> <andrea.boll...@4science.it> wrote:
> > Dear Alan,
> >
> > on DSpace-CRIS we have make the indexing process multi-thread and for
> > our experience this improve the performance a lot, 10x or more depending
> > on the number of threads used and the server configuration.
> >
> > See
> >
> >
> https://github.com/4Science/DSpace/commit/6206ca6f7980cdae31d5bd69c450d706f8518dfb
> >
> >
> https://github.com/4Science/DSpace/blob/dspace-5_x_x-cris/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L476
> >
> >
> https://github.com/4Science/DSpace/blob/dspace-5_x_x-cris/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L2585
> >
> >
> > Running SOLR on a separate tomcat, or better on a dedicated server bring
> > also big improvements. When you use multiple threads you need to be sure
> > to have enough database connections to serve all the threads and the
> > running webapps.
> >
> > If you are available to test this improvement on a plan DSpace 5.x, we
> > can prepare a pull request (against DSpace 5.x) and if the feedback is
> > good we can also port it to DSpace 6.x
> >
> > BTW, the DSpace-CRIS enhancement also introduce the ability to (re)index
> > a single object so can also replace the pull request
> > https://github.com/DSpace/DSpace/pull/1469
> >
> > Best,
> > Andrea
> >
> > Il 12/08/2016 15:33, Alan Orth ha scritto:
> >> Evelthon,
> >>
> >> Interesting observation about the indexing speed. Just yesterday I
> >> posted a message about Java JVM settings for Solr/Lucene to this
> >> mailing list. I'm sure there is room for improvements in Solr
> >> performance if you're willing to monitor, tweak, monitor, tweak, etc.
> >> I've stayed away from JVM tuning for the most part. Here's the link I
> >> posted yesterday, from a Solr developer, where he recommends some JVM
> >> settings as well as Java 8 (which wasn't the case when I checked this
> >> wiki last year):
> >>
> >> https://wiki.apache.org/solr/ShawnHeisey
> >>
> >> For what it's worth, our indexing takes ~60 minutes for 55,000 items,
> >> and we're on a Linode VPS where we have an SSD and plenty of CPU cores
> >> and memory — I hate to think how long it takes on less performant
> >> hardware.
> >>
> >> Regards,
> >>
> >>
> >> On Fri, Aug 12, 2016 at 10:56 AM, Evelthon Prodromou
> >> <prodromou.evelt...@ucy.ac.cy> wrote:
> >>> Hello Alan,
> >>>
> >>> Basically to finish the initial indexing, media-filter faster.
> Probably an
> >>> overkill.
> >>>
> >>>
> >>> Thanks.
> >>>
> >>> On Friday, August 12, 2016 at 10:20:49 AM UTC+3, Alan Orth wrote:
> >>>> I'm glad you solved it, Evelthon.
> >>>>
> >>>> I guess it depends on your OS and how you have Tomcat running. In
> >>>> Ubuntu we set JAVA_OPTS in /etc/default/tomcat7, but CentOS's Tomcat
> >>>> is surely different. By the way, there's more discussion about tuning
> >>>> DSpace (including JAVA_OPTS and CATALINA_OPTS) on the wiki:
> >>>>
> >>>> https://wiki.duraspace.org/display/DSDOC5x/Performance+Tuning+DSpace
> >>>>
> >>>> I still wonder why your JVM settings are so highly tweaked. Most
> >>>> people don't need to adjust those, and unless you really know you need
> >>>> them, I'd say to leave them off. Remember, "premature optimization is
> >>>> the root of all evil" ;)
> >>>>
> >>>> Cheers,
> >>>>
> >>>> On Fri, Aug 12, 2016 at 9:59 AM, Evelthon Prodromou
> >>>> <prodromou...@ucy.ac.cy> wrote:
> >>>>> Hello Luigi,
> >>>>>
> >>>>> CATALINA_OPTS did it. All works good now. Seems like solr was
> choaking
> >>>>> and
> >>>>> caused slow loading on the UI.
> >>>>>
> >>>>> Curious though, shouldn't it fallback to java_opts?
> >>>>>
> >>>>>
> >>>>> In any case, thank you.
> >>>>>
> >>>>> Evelthon
> >>>>>
> >>>>>
> >>>>> On Thursday, August 11, 2016 at 1:01:34 PM UTC+3, Luigi Andrea
> >>>>> Pascarelli
> >>>>> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> as far as I know DSpace doesn't need a big amount of memory. For the
> >>>>>> first
> >>>>>> step you can try to setup CATALINA_OPTS to
> >>>>>>
> >>>>>> CATALINA_OPTS: -Xms1024m -Xmx2048m -XX:MaxPermSize=256m
> >>>>>> -Dfile.encoding=UTF-8
> >>>>>>
> >>>>>> And as Alan highlighted you could use JAVA_OPTS with less memory.
> >>>>>>
> >>>>>> Second step you could try to check if the I/O is the real issue.
> Maybe
> >>>>>> POSTGRES and SOLR and TOMCAT write and read on the same node and so
> you
> >>>>>> can
> >>>>>> benefits from separate them (or one of them) on differents volumes.
> >>>>>>
> >>>>>> Let me know.
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Luigi Andrea
> >>>>>>
> >>>>>>
> >>>>>> Il 11/08/2016 11:46, Evelthon Prodromou ha scritto:
> >>>>>>
> >>>>>> The server has 32GB, postgresql on different box. I don't think
> it's a
> >>>>>> RAM
> >>>>>> issue.
> >>>>>>
> >>>>>> I am wondering if it is a solr issue. I don't have tomcat running as
> >>>>>> user
> >>>>>> dspace. Instead, I change ownership in [dspace]/solr to
> dspace:tomcat
> >>>>>> and
> >>>>>> gave rw rights to both user and group.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thursday, August 11, 2016 at 12:33:05 PM UTC+3, Alan Orth wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>>  From your JAVA_OPTS I see you are allocating 4096 + 2048
> megabytes of
> >>>>>>> RAM to Tomcat right from the start. How much memory does your
> server
> >>>>>>> have? This means your host must have AT LEAST 6GB of RAM just for
> >>>>>>> Tomcat, let alone PostgreSQL, Solr, and the rest of the operating
> >>>>>>> system. I wouldn't be surprised if you are encountering poor
> >>>>>>> performance due to swapping.
> >>>>>>>
> >>>>>>> For reference, we run a fairly large DSpace instance with ~55,000
> >>>>>>> items and a decent amount of traffic and these are our JAVA_OPTS:
> >>>>>>>
> >>>>>>> -Djava.awt.headless=true -Xms3072m -Xmx3072m -XX:MaxPermSize=256m
> >>>>>>> -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8
> >>>>>>>
> >>>>>>> Our server has 8GB of physical memory. Unless you know you need all
> >>>>>>> those JVM tweaks, I'd start by simplifying your JAVA_OPTS to
> something
> >>>>>>> more simple (for testing at least).
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> On Wed, Aug 10, 2016 at 7:03 PM, Evelthon Prodromou
> >>>>>>> <prodromou...@ucy.ac.cy> wrote:
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>   I seem to be having an issue with tomcat. It takes ~38+ seconds
> to
> >>>>>>>> load
> >>>>>>>> pages. I believe it's tomcat since i notice shell scripts (
> >>>>>>>> [dspace]/bin/dspace ) executing slow when tomcat is started, and
> >>>>>>>> very
> >>>>>>>> fast
> >>>>>>>> (normal I presume) when tomcat is stopped.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The system  is a new installation of DSpace 5.5 on CentOS7. Data
> and
> >>>>>>>> sql
> >>>>>>>> migrated from an 1.7.0 installation.
> >>>>>>>>
> >>>>>>>> tomcat.conf includes the following JAVA_OPTS
> >>>>>>>>
> >>>>>>>> JAVA_OPTS="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m
> >>>>>>>> -Dfile.encoding=UTF-8
> >>>>>>>> -XX:MaxHeapFreeRatio=70 -XX:+UseConcMarkSweepGC
> >>>>>>>> -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled"
> >>>>>>>>
> >>>>>>>> index-discovery was executed and discovery facets show up.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Someone please point me to the right direction to investigate.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thank you,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Evelthon
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> You received this message because you are subscribed to the Google
> >>>>>>>> Groups
> >>>>>>>> "DSpace Technical Support" group.
> >>>>>>>> To unsubscribe from this group and stop receiving emails from it,
> >>>>>>>> send
> >>>>>>>> an
> >>>>>>>> email to dspace-tech...@googlegroups.com.
> >>>>>>>> To post to this group, send email to dspac...@googlegroups.com.
> >>>>>>>> Visit this group at https://groups.google.com/group/dspace-tech.
> >>>>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Alan Orth
> >>>>>>> alan...@gmail.com
> >>>>>>> https://englishbulgaria.net
> >>>>>>> https://alaninkenya.org
> >>>>>>> https://mjanja.ch
> >>>>>>> "In heaven all the interesting people are missing." ―Friedrich
> >>>>>>> Nietzsche
> >>>>>>> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
> >>>>>> --
> >>>>>> You received this message because you are subscribed to the Google
> >>>>>> Groups
> >>>>>> "DSpace Technical Support" group.
> >>>>>> To unsubscribe from this group and stop receiving emails from it,
> send
> >>>>>> an
> >>>>>> email to dspace-tech...@googlegroups.com.
> >>>>>> To post to this group, send email to dspac...@googlegroups.com.
> >>>>>> Visit this group at https://groups.google.com/group/dspace-tech.
> >>>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Luigi Andrea Pascarelli
> >>>>>>
> >>>>>> DSpace Committer and DSpace-CRIS Lead Developer
> >>>>>>
> >>>>>> 4Science,  www.4science.it (an Itway Group Company)
> >>>>>>
> >>>>>> office: Via Edoardo D'Onofrio 304, 00155 Roma, Italy
> >>>>>> tel: +39 333 934 1782 <+39%20333%20934%201782>
> >>>>>> skype: l_a_p82
> >>>>>> linkedin: luigiandreapascarelli
> >>>>>>
> >>>>>> ====================================
> >>>>>> Salvate un albero. Non stampate questa mail se non necessario.
> >>>>>> Save a tree. Don't print this e-mail unless it's really necessary.
> >>>>>>
> >>>>>> DISCLAIMER: Le informazioni contenute in questo messaggio sono
> >>>>>> confidenziali, possono essere protette da leggi locali,
> >>>>>> e devono essere utilizzate esclusivamente dal destinatario.  La
> >>>>>> pubblicazione, l'utilizzo, la divulgazione, la stampa
> >>>>>> o la copia non autorizzata del contenuto della presente e-mail sono
> >>>>>> espressamente vietate e potenzialmente illegali.
> >>>>>> Nel caso si sia ricevuto il presente messaggio per errore, si prega
> di
> >>>>>> cancellarlo e di inviarne notifica al mittente.
> >>>>>>
> >>>>>> DISCLAIMER: The information contained in this message is
> confidential,
> >>>>>> can
> >>>>>> be legally protected by local Laws,
> >>>>>> and must be exclusively used by the recipient. The publication, use,
> >>>>>> distribution, printing or unauthorized copy
> >>>>>> of the content of this message is strictly forbidden and it can be
> >>>>>> illegal. If you received this message by mistake,
> >>>>>> please destroy it and notify it to the sender.
> >>>>> --
> >>>>> You received this message because you are subscribed to the Google
> >>>>> Groups
> >>>>> "DSpace Technical Support" group.
> >>>>> To unsubscribe from this group and stop receiving emails from it,
> send
> >>>>> an
> >>>>> email to dspace-tech...@googlegroups.com.
> >>>>> To post to this group, send email to dspac...@googlegroups.com.
> >>>>> Visit this group at https://groups.google.com/group/dspace-tech.
> >>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>
> >>>>
> >>>> --
> >>>> Alan Orth
> >>>> alan...@gmail.com
> >>>> https://englishbulgaria.net
> >>>> https://alaninkenya.org
> >>>> https://mjanja.ch
> >>>> "In heaven all the interesting people are missing." ―Friedrich
> Nietzsche
> >>>> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
> >>> --
> >>> You received this message because you are subscribed to the Google
> Groups
> >>> "DSpace Technical Support" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> an
> >>> email to dspace-tech+unsubscr...@googlegroups.com.
> >>>
> >>> To post to this group, send email to dspace-tech@googlegroups.com.
> >>> Visit this group at https://groups.google.com/group/dspace-tech.
> >>> For more options, visit https://groups.google.com/d/optout.
> >>
> >>
> >
> > --
> > Andrea Bollini
> > Chief Technology Innovation Officer
> >
> > 4Science,  www.4science.it
> > office: Via Edoardo D'Onofrio 304, 00155 Roma, Italy
> > mobile: +39 333 934 1808 <+39%20333%20934%201808>
> > skype: a.bollini
> > linkedin: andreabollini
> > orcid: 0000-0002-9029-1854
> >
> > an Itway Group Company
> > Italy, France, Spain, Portugal, Greece, Turkey, Lebanon, Qatar,
> U.A.Emirates
>
>
>
> --
> Alan Orth
> alan.o...@gmail.com
> https://englishbulgaria.net
> https://alaninkenya.org
> https://mjanja.ch
> "In heaven all the interesting people are missing." ―Friedrich Nietzsche
> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>
-- 

Alan Orth
alan.o...@gmail.com
https://englishbulgaria.net
https://alaninkenya.org
https://mjanja.ch

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to