I just wanted to pick this up, but somehow my JIRA account got
deactivated. Once I have that figured out, I'll try to propose the
change. Thank you!
On 28.02.20 14:13, David Smiley wrote:
Thanks for your input David. I won't accept the patch because I think
there's a more appropriate way to go about this -- have the Tagger
constructor take an Analyzer instead of a TokenStream in the
constructor, and then have the process method take the InputStream
and/or string (the fundamental input to the tagger), thus allowing
repeated use of the same Tagger. It's been a long-standing FAQ: how do
I tag in bulk, and this change would kind of help with that, at least at
a low level which is your need. I'll filed a JIRA: SOLR-14292 -
Refactor Tagger for re-use, thus aiding bulk-tagging
<https://issues.apache.org/jira/browse/SOLR-14292> I don't plan on
doing this anytime soon so feel free to take it up if you wish.
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
On Fri, Feb 28, 2020 at 4:12 AM David '-1' Schmid
<david.sch...@vis.uni-stuttgart.de
<mailto:david.sch...@vis.uni-stuttgart.de>> wrote:
On 27.02.20 19:01, David Smiley wrote:
> I'm glad you got it working! It's sad you felt the need to
copy-paste
> the tagger; perhaps you can recommend changes to make it more
extensible
> so that you or others needn't fork it.
Don't need to feel sad, just as I mentioned: it's quick, dirty and I
did
not know better.
I was wondering how to feed multiple Strings into the tagger w/o
creating new instances of everything, but as I don't know much about
how
the tokenizers work, I just slapped everything together.
I had planned to maybe use an InputStream that blocks once one string
was exhausted, so I can feed the tags back into the stream and feed the
InputStream new data, once TupleStream::read is called again.
But since I wanted to get this done quickly, ... yeah. That happened.
Not happy with it, but I learned a lot.
I'm not sure if I'm qualified enough to recommend changes about the
tagger. I'd maybe change the constructor to not accept a TokenStream,
but just the configuration (reduce strategy, terms, ...). And provide a
setter for the TokenStream. (patch attached)
But that implies that a TokenStream is cheap to construct and use,
which
I don't know.
> I'm not sure if something like this should be contributed back to
Solr
> itself. I don't even know the bigger picture of why you are
doing this,
> so I am pessimistic :-).
Which is completely fine :D
Thank you for the guidance!
best regards,
David
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Feb 27, 2020 at 8:01 AM David '-1' Schmid
> <david.sch...@vis.uni-stuttgart.de
<mailto:david.sch...@vis.uni-stuttgart.de>
> <mailto:david.sch...@vis.uni-stuttgart.de
<mailto:david.sch...@vis.uni-stuttgart.de>>> wrote:
>
> Hello again!
>
> On 25.02.20 22:39, David Smiley wrote:
> > I haven't worked on streaming expressions yet but I did a
little
> bit of
> > digging around. I think the ClassifyStream might be somewhat
> similar to
> > learn from. It takes a stream of docs, not unlike what you
> want. And
> > crucially it implements setStreamContext with an
implementation
> which
> > demonstrates how to get access to a SolrCore. From a
core, you
> can get
> > a SolrIndexSearcher. [...]
>
> That worked beautifully! Or let's say: I got it working, the
code is
> not
> beautiful, as is.
> Would this be interesting/relevant enough to be adopted upstream?
>
> If so, should I open up a JIRA ticket?
>
> best regards,
> David
>
>
>
> > On Fri, Feb 21, 2020 at 8:05 AM David '-1' Schmid
> > <david.sch...@vis.uni-stuttgart.de
<mailto:david.sch...@vis.uni-stuttgart.de>
> <mailto:david.sch...@vis.uni-stuttgart.de
<mailto:david.sch...@vis.uni-stuttgart.de>>
> > <mailto:david.sch...@vis.uni-stuttgart.de
<mailto:david.sch...@vis.uni-stuttgart.de>
> <mailto:david.sch...@vis.uni-stuttgart.de
<mailto:david.sch...@vis.uni-stuttgart.de>>>> wrote:
> >
> > Hello dear developers!
> >
> > I've been wondering if I'd be able to adapt the current
> > TaggerRequestHandler for using it within the /stream
request
> handler.
> >
> > Starting out is a tad confusing, which I expected
since I have
> > almost no
> > experience with the solr/lucene codebase.
> >
> > My goal is as follows: I want to use the result of a
previous
> > select(coll1, ...) as input for adding tags to the result
> document.
> >
> > Possibly:
> > tag(
> > select(...), field_to_analyze_for_tags,
> > collection_with_tag_dict, tag_dict_field,
> > ... // remaining tagger configuration options
> > )
> >
> > I'm currently stuck at some steps in writing a
> > 'public class TaggerStream extends TupleStream implements
> Expressible'
> > at two points:
> >
> > == Problem 1: Getting 'terms' ==
> >
> > The TaggerRequestHandler gets a SolrIndexSearcher via
the request
> >
> > > final SolrIndexSearcher searcher = req.getSearcher();
> >
> > Which in turn is used to to acquire the terms
> >
> > > Terms terms =
> searcher.getSlowAtomicReader().terms(indexedField);
> >
> > which are used for tagging.
> >
> > I've tried finding something that will yield the
equivalent,
> but as you
> > might have guessed: I didn't find anything so far.
> >
> >
> > == Problem 2: Multiple Shards ==
> >
> > I guess, this might come up sooner or later, hence this is
> related to
> > SOLR-14190 (requesting the tagger to work across multiple
> shards).
> > I suspect (mind: I really don't know) that acquiring the
> terms will
> > have
> > to do something with that, at least when we need to
merge the
> results
> > from multiple shards, but I have not yet found any
code that
> does that.
> > Might have been blinded by my confusion, tho.
> >
> >
> > I'd be thankful if someone can help with any pointers
> regarding this.
> >
> > best regards,
> > David
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
dev-unsubscr...@lucene.apache.org
<mailto:dev-unsubscr...@lucene.apache.org>
> <mailto:dev-unsubscr...@lucene.apache.org
<mailto:dev-unsubscr...@lucene.apache.org>>
> > <mailto:dev-unsubscr...@lucene.apache.org
<mailto:dev-unsubscr...@lucene.apache.org>
> <mailto:dev-unsubscr...@lucene.apache.org
<mailto:dev-unsubscr...@lucene.apache.org>>>
> > For additional commands, e-mail:
dev-h...@lucene.apache.org <mailto:dev-h...@lucene.apache.org>
> <mailto:dev-h...@lucene.apache.org
<mailto:dev-h...@lucene.apache.org>>
> > <mailto:dev-h...@lucene.apache.org
<mailto:dev-h...@lucene.apache.org>
> <mailto:dev-h...@lucene.apache.org
<mailto:dev-h...@lucene.apache.org>>>
> >
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
<mailto:dev-unsubscr...@lucene.apache.org>
> <mailto:dev-unsubscr...@lucene.apache.org
<mailto:dev-unsubscr...@lucene.apache.org>>
> For additional commands, e-mail: dev-h...@lucene.apache.org
<mailto:dev-h...@lucene.apache.org>
> <mailto:dev-h...@lucene.apache.org
<mailto:dev-h...@lucene.apache.org>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
<mailto:dev-unsubscr...@lucene.apache.org>
For additional commands, e-mail: dev-h...@lucene.apache.org
<mailto:dev-h...@lucene.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org