Some background to consider before committing to that... it might not be as
trivial as you think. (I've often thought it ironic that we don't have real
search for our ref guide... )

https://www.youtube.com/watch?v=DixlnxAk08s

-Gus

On Mon, Aug 31, 2020 at 2:06 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> I love the idea of making the ref guide itself as an example dataset. That
> way, we won't need to ship anything separately. Python's beautiful soup can
> extract text from the html pages. I'm sure there maybe such things in Java
> too (can Tika do this?).
>
> On Mon, 31 Aug, 2020, 11:18 pm Alexandre Rafalovitch, <arafa...@gmail.com>
> wrote:
>
>> Hi,
>> I need a sanity check.
>>
>> I am in the planning stages for the new example datasets to ship with
>> Solr 9. The one I am looking at is great for structured information,
>> but is quite light on full-text content. So, I am thinking of how
>> important that is and what other sources could be used.
>>
>> One - only slightly - crazy idea is to use Solr Reference Guide itself
>> as a document source. I am not saying we need to include the guide
>> with Solr distribution, but:
>> 1) I could include a couple of sample pages
>> 2) I could index the whole guide (with custom Java-code) during the
>> final build and we could ship the full index (with stored=false) with
>> Solr, which then basically becomes a local search for the remote guide
>> (with absolute URLs).
>>
>> Either way would allow us to also explore what a good search
>> configuration could look like for the Ref Guide for when we are
>> actually ready to move beyond its current "headings-only" javascript
>> search. Actually, done right, same/similar tool could also feed
>> subheadings into the javascript search.
>>
>> Like I said, sanity check?
>>
>> Regards,
>>    Alex.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to