I’d rather ship a tutorial and tooling that explains how to index the 
ref-guide, than shipping a binary index.
What other full-text datasets have you considered as candidates for 
getting-started examples?

Jan

> 1. sep. 2020 kl. 05:53 skrev Alexandre Rafalovitch <[email protected]>:
> 
> I did not say it was trivial, but I also did not quite mention the previous 
> research.
> 
> https://github.com/arafalov/solr-refguide-indexing/blob/master/src/com/solrstart/refguide/Indexer.java
>  
> <https://github.com/arafalov/solr-refguide-indexing/blob/master/src/com/solrstart/refguide/Indexer.java>
> 
> Uses official AsciidoctorJ library directory. Not sure if that's just JRuby 
> version of Asciidoctor we currently use to build. But this should only affect 
> the development process, not the final built package. 
> 
> I think I am more trying to figure out what people think about shipping an 
> actual core with the distribution. That is something I haven't seen done 
> before. And may have issues I did not think of. 
> 
> Regards, 
>     Alex
> 
> On Mon., Aug. 31, 2020, 10:11 p.m. Gus Heck, <[email protected] 
> <mailto:[email protected]>> wrote:
> Some background to consider before committing to that... it might not be as 
> trivial as you think. (I've often thought it ironic that we don't have real 
> search for our ref guide... )
> 
> https://www.youtube.com/watch?v=DixlnxAk08s 
> <https://www.youtube.com/watch?v=DixlnxAk08s>
> 
> -Gus
> 
> On Mon, Aug 31, 2020 at 2:06 PM Ishan Chattopadhyaya 
> <[email protected] <mailto:[email protected]>> wrote:
> I love the idea of making the ref guide itself as an example dataset. That 
> way, we won't need to ship anything separately. Python's beautiful soup can 
> extract text from the html pages. I'm sure there maybe such things in Java 
> too (can Tika do this?).
> 
> On Mon, 31 Aug, 2020, 11:18 pm Alexandre Rafalovitch, <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi,
> I need a sanity check.
> 
> I am in the planning stages for the new example datasets to ship with
> Solr 9. The one I am looking at is great for structured information,
> but is quite light on full-text content. So, I am thinking of how
> important that is and what other sources could be used.
> 
> One - only slightly - crazy idea is to use Solr Reference Guide itself
> as a document source. I am not saying we need to include the guide
> with Solr distribution, but:
> 1) I could include a couple of sample pages
> 2) I could index the whole guide (with custom Java-code) during the
> final build and we could ship the full index (with stored=false) with
> Solr, which then basically becomes a local search for the remote guide
> (with absolute URLs).
> 
> Either way would allow us to also explore what a good search
> configuration could look like for the Ref Guide for when we are
> actually ready to move beyond its current "headings-only" javascript
> search. Actually, done right, same/similar tool could also feed
> subheadings into the javascript search.
> 
> Like I said, sanity check?
> 
> Regards,
>    Alex.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] 
> <mailto:[email protected]>
> For additional commands, e-mail: [email protected] 
> <mailto:[email protected]>
> 
> 
> 
> -- 
> http://www.needhamsoftware.com <http://www.needhamsoftware.com/> (work)
> http://www.the111shift.com <http://www.the111shift.com/> (play)

Reply via email to