Dear Mark,

After having exported a slice of my 2019 statistics from production I've
just done two experiments in my development environment: manually create a
`statistics-2019` core and load the 2019 hits into it, and load data into
the main `statistics` core and initiate the `dspace stats-util -s` yearly
sharding process. In both cases the core's data is online and available
immediately after it is loaded. In the first case the manually created core
does not get loaded the next time I restart Tomcat, while in the second
case the DSpace-created core does.

Regarding DSpace doing something "hacky" in using multiple data-only cores
that share an instanceDir, I'm also wondering how that fits into the
official use cases of Solr! I want to add some debug logging to
SolrLoggerServiceImpl.java (DSpace 6.x) to try to understand why my
manually-created core doesn't get loaded. Possibly related, about half the
time we start Tomcat on our production server one of the cores fails to
load anyways! To be honest it's making me a bit nervous about running with
all these shards (we have ten, back to 2010!) and I am debating whether I
should just put everything back in the main statistics core. How does the
migration process to a more modern Solr with DSpace 7 look with our "hacky"
sharding?

Regards,

On Thu, Feb 6, 2020 at 5:03 PM Mark H. Wood <mwoodiu...@gmail.com> wrote:

> On Thu, Feb 06, 2020 at 02:50:43PM +0200, Alan Orth wrote:
> > Our yearly Solr statistics sharding (stats-util -s) failed this year
> > because our core is very large (43GiB) and apparently timed out
> somewhere.
> > It failed again when I tried to run it manually:
> >
> > Moving: 51633080 into core statistics-2019
> > ...
> > Exception: Read timed out
> > java.net.SocketTimeoutException: Read timed out
> >
> > As a test I used this really great tool called solr-import-export-json to
> > export some of my 2019 statistics to JSON on the production server, then
> > import them into a new core in my development instance:
> >
> > $ ./run.sh -s http://localhost:8081/solr/statistics -a export -o
> > /tmp/statistics-2019-01.json -f 'dateYearMonth:2019-01' -k uid
> > $ curl '
> >
> http://localhost:8080/solr/admin/cores?action=CREATE&name=statistics-2019&instanceDir=/home/aorth/dspace/solr/statistics&dataDir=/home/aorth/dspace/solr/statistics-2019/data
> > '
> > $ ./run.sh -s http://localhost:8080/solr/statistics-2019 -a import -o
> > /tmp/statistics-2019-01.json -k uid
> >
> > This worked brilliantly... in fact I am very impressed with this tool and
> > recommend it to people!
> >
> > The problem is, this core does not get enumerated automatically by Solr
> > after I restart the servlet container. I got it to load by hard-coding
> the
> > core into dspace/solr/solr.xml configĀ² but it seems hacky. How are these
> > core shards enumerated by DSpace's Solr application? What would cause
> > shards to not be loaded automatically?
> >
> > My environment is DSpace 5.8 with Tomcat 7.0.99 and OpenJDK 8.
>
> I think that a good place to look is
>
> 'dspace-api/src/main/java/org/dspace/statistics/SolrLoggerServiceImpl#initSolrYearCores'.
> Also #createCore in the same class.  This is where DSpace enumerates
> the cores that it will use for statistics.  It seems to be looking for
> directories 'solr/statistics-YYYY'.  It will call CREATE in Solr's
> CoreAdmin API, which would seem to register a core if it already
> exists.  You seem to be doing the same thing, but there must be
> something slightly different about your actions.  Or perhaps the way
> you are testing -- it looks to me as though Solr is unaware of the
> additional cores at startup and is told of them by DSpace when *it*
> starts up.
>
> But I think it is actually DSpace that is doing something hacky:
> using the same InstanceDir for multiple cores.  I have no idea why
> that works.
>
> Sadly, SolrJ is almost entirely undocumented, at least in this area.
> I have had to puzzle out a lot of its working by reference to the web
> API documentation in the Solr Ref Guide.
>
> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
>
> --
> All messages to this mailing list should adhere to the DuraSpace Code of
> Conduct: https://duraspace.org/about/policies/code-of-conduct/
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dspace-tech+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-tech/20200206150143.GF11530%40IUPUI.Edu
> .
>


-- 
Alan Orth
alan.o...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/CAKKdN4W-nvVTvsdgfoy0KVbEjE7diigr3F%2B6hvnMOwEEnMhAhA%40mail.gmail.com.

Reply via email to