Re: Streaming Tagger

Eric Pugh Thu, 27 Feb 2020 05:41:29 -0800

David,   I know I’d love to see the code!   

I’ve been working on streaming expression called “bump” that triggers a atomic 
update reindex process on a document.   I’m using it as part of a relevancy 
experimentation workflow where I add a new copyField or change my schema 
analyzers, and then I “bump” each document to cause it to reindex.   This way I 
don’t need to reindex from source.


I’m planning up pushing up a Github repo with this function.

Eric


> On Feb 27, 2020, at 8:01 AM, David '-1' Schmid 
> <david.sch...@vis.uni-stuttgart.de> wrote:
> 
> Hello again!
> 
> On 25.02.20 22:39, David Smiley wrote:
>> I haven't worked on streaming expressions yet but I did a little bit of 
>> digging around.  I think the ClassifyStream might be somewhat similar to 
>> learn from.  It takes a stream of docs, not unlike what you want.  And 
>> crucially it implements setStreamContext with an implementation which 
>> demonstrates how to get access to a SolrCore.  From a core, you can get a 
>> SolrIndexSearcher. [...]
> 
> That worked beautifully! Or let's say: I got it working, the code is not 
> beautiful, as is.
> Would this be interesting/relevant enough to be adopted upstream?
> 
> If so, should I open up a JIRA ticket?
> 
> best regards,
> David
> 
> 
> 
>> On Fri, Feb 21, 2020 at 8:05 AM David '-1' Schmid 
>> <david.sch...@vis.uni-stuttgart.de 
>> <mailto:david.sch...@vis.uni-stuttgart.de>> wrote:
>>    Hello dear developers!
>>    I've been wondering if I'd be able to adapt the current
>>    TaggerRequestHandler for using it within the /stream request handler.
>>    Starting out is a tad confusing, which I expected since I have
>>    almost no
>>    experience with the solr/lucene codebase.
>>    My goal is as follows: I want to use the result of a previous
>>    select(coll1, ...) as input for adding tags to the result document.
>>    Possibly:
>>    tag(
>>        select(...), field_to_analyze_for_tags,
>>        collection_with_tag_dict, tag_dict_field,
>>        ... // remaining tagger configuration options
>>    )
>>    I'm currently stuck at some steps in writing a
>>    'public class TaggerStream extends TupleStream implements Expressible'
>>    at two points:
>>    == Problem 1: Getting 'terms' ==
>>    The TaggerRequestHandler gets a SolrIndexSearcher via the request
>>      > final SolrIndexSearcher searcher = req.getSearcher();
>>    Which in turn is used to to acquire the terms
>>      > Terms terms = searcher.getSlowAtomicReader().terms(indexedField);
>>    which are used for tagging.
>>    I've tried finding something that will yield the equivalent, but as you
>>    might have guessed: I didn't find anything so far.
>>    == Problem 2: Multiple Shards ==
>>    I guess, this might come up sooner or later, hence this is related to
>>    SOLR-14190 (requesting the tagger to work across multiple shards).
>>    I suspect (mind: I really don't know) that acquiring the terms will
>>    have
>>    to do something with that, at least when we need to merge the results
>>    from multiple shards, but I have not yet found any code that does that.
>>    Might have been blinded by my confusion, tho.
>>    I'd be thankful if someone can help with any pointers regarding this.
>>    best regards,
>>    David
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>    <mailto:dev-unsubscr...@lucene.apache.org>
>>    For additional commands, e-mail: dev-h...@lucene.apache.org
>>    <mailto:dev-h...@lucene.apache.org>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: Streaming Tagger

Reply via email to