I’d rather ship a tutorial and tooling that explains how to index the ref-guide, than shipping a binary index. What other full-text datasets have you considered as candidates for getting-started examples?
Jan > 1. sep. 2020 kl. 05:53 skrev Alexandre Rafalovitch <[email protected]>: > > I did not say it was trivial, but I also did not quite mention the previous > research. > > https://github.com/arafalov/solr-refguide-indexing/blob/master/src/com/solrstart/refguide/Indexer.java > > <https://github.com/arafalov/solr-refguide-indexing/blob/master/src/com/solrstart/refguide/Indexer.java> > > Uses official AsciidoctorJ library directory. Not sure if that's just JRuby > version of Asciidoctor we currently use to build. But this should only affect > the development process, not the final built package. > > I think I am more trying to figure out what people think about shipping an > actual core with the distribution. That is something I haven't seen done > before. And may have issues I did not think of. > > Regards, > Alex > > On Mon., Aug. 31, 2020, 10:11 p.m. Gus Heck, <[email protected] > <mailto:[email protected]>> wrote: > Some background to consider before committing to that... it might not be as > trivial as you think. (I've often thought it ironic that we don't have real > search for our ref guide... ) > > https://www.youtube.com/watch?v=DixlnxAk08s > <https://www.youtube.com/watch?v=DixlnxAk08s> > > -Gus > > On Mon, Aug 31, 2020 at 2:06 PM Ishan Chattopadhyaya > <[email protected] <mailto:[email protected]>> wrote: > I love the idea of making the ref guide itself as an example dataset. That > way, we won't need to ship anything separately. Python's beautiful soup can > extract text from the html pages. I'm sure there maybe such things in Java > too (can Tika do this?). > > On Mon, 31 Aug, 2020, 11:18 pm Alexandre Rafalovitch, <[email protected] > <mailto:[email protected]>> wrote: > Hi, > I need a sanity check. > > I am in the planning stages for the new example datasets to ship with > Solr 9. The one I am looking at is great for structured information, > but is quite light on full-text content. So, I am thinking of how > important that is and what other sources could be used. > > One - only slightly - crazy idea is to use Solr Reference Guide itself > as a document source. I am not saying we need to include the guide > with Solr distribution, but: > 1) I could include a couple of sample pages > 2) I could index the whole guide (with custom Java-code) during the > final build and we could ship the full index (with stored=false) with > Solr, which then basically becomes a local search for the remote guide > (with absolute URLs). > > Either way would allow us to also explore what a good search > configuration could look like for the Ref Guide for when we are > actually ready to move beyond its current "headings-only" javascript > search. Actually, done right, same/similar tool could also feed > subheadings into the javascript search. > > Like I said, sanity check? > > Regards, > Alex. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > <mailto:[email protected]> > For additional commands, e-mail: [email protected] > <mailto:[email protected]> > > > > -- > http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work) > http://www.the111shift.com <http://www.the111shift.com/> (play)
