If the generics could be contained _instead of_ spreading to the UH class itself (making UH typed), I think it could be nice. But given the per-field possible settings for formatting... that in particular makes balancing these concerns hard. I guess in the end Object isn't too bad since it's limited to the advanced method use-case (highlightFieldsAsObjects).
On Wed, Jan 11, 2017 at 8:45 AM Timothy Rodriguez (BLOOMBERG/ 120 PARK) < [email protected]> wrote: > While we were open sourcing it. I had tried creating a patch to generify > it, but the generics did wind up all over the place. Ultimately the > UnifiedHighlighter would need to be generic itself so it can ensure the > passage formatters etc are of the same type. (Or alternatively, generic > passage formatters are passed in per request.) I wound up dumping the > changes because they were quite substantial and they'd also push it further > from the PostingsHighlighjer. > > I'm hoping to get back to trying that again in he future. It'd be nice to > have a PassageFormatter<MyCustomObject>. > > -Tim > > Sent from Bloomberg Professional for iPhone > > > ----- Original Message ----- > From: Dawid Weiss <[email protected]> > To: [email protected] > CC: TIMOTHY RODRIGUEZ, [email protected] > At: 11-Jan-2017 08:37:37 > > Thanks David! > > That's almost exactly what I ended up doing. I don't mind casting > Object to my own type; you can always make it a covariant override in > your subclass (which you have to do to access those expert-level > methods anyway). > > I still kind of think startOffset/endOffset and other related methods > could be made public to allow tinkering with them in > FieldHighlighter#highlightOffsetsEnums (otherwise this method is > protected for overriding, but useless in practice). > > There is another API problem I found too. If you wish to override > FieldHighlighter.getSummaryPassagesNoHighlight you can't return > anything sensible because Passage is final, contains only > package-private fields and addMatch is package-private too. So you > can't create a "custom" passage. > > I can file an issue and provide a patch if these changes are not > against the design of the unified highlighter? > > Dawid > > On Wed, Jan 11, 2017 at 2:24 PM, David Smiley <[email protected]> > wrote: > > Hi Dawid, > > > > You could write a trivial PassageFormatter that simply returns the > Passage > > list instead of doing formatting. Passages contain offsets. And yes, > > WholeBreakIterator if you don't need passage fragmentation. Unless I'm > > missing some aspect of your requirements, this doesn't involve any > internal > > highlighter customizing. Perhaps Javadocs could be improved to make this > > more clear... and perhaps this Passage-returning PassageFormatter could > be > > included to clarify how it's done. I recall doing or seeing this recently > > months ago but I'm not sure. > > > > One ugly aspect of the API (shared with it's PostingsHighlighter lineage) > > related to this discussion is that the PassageFormatter is declared to > > return Object. It's kinda hard to rectify it to be typed, perhaps with > > generics, while also not spilling lots of generics to other places (the > UH > > itself) just because of this. Perhaps UH.highlightFieldsAsObjects() > could > > be modified to take a Class to thus provide a type for the output... and > > maybe the PassageFormatter could declare not only with generics but with > a > > method what types of results it produces. I'm curious what you think. > > > > ~ David > > > > > > On Wed, Jan 11, 2017 at 6:02 AM Dawid Weiss <[email protected]> > wrote: > >> > >> To follow-up: I hacked into the offsets by passing WholeBreakIterator > >> and a custom PassageFormatter that just returns the matches from the > >> singleton resulting passage. This is suboptimal though, as there's > >> still some complex logic going on in highlightOffsetsEnums that could > >> be avoided. > >> > >> Dawid > >> > >> On Wed, Jan 11, 2017 at 11:34 AM, Dawid Weiss <[email protected]> > >> wrote: > >> > Can any of the folks who contributed to UnifiedHighlighter (David?) > >> > clarify my thinking here? > >> > > >> > I have a requirement to extract (for a set of search results) a list > >> > of exact "hit" ranges (field offsets, with support for multi-term > >> > queries and span queries). Obviously, I'm only talking about queries > >> > that relate to field content somehow, but this has always been quite > >> > problematic and required the use of multiple helper classes > >> > (WeightedSpanTermExtractor, MultiTermHighlighting, etc.) and pretty > >> > hairy logic. > >> > > >> > So I turned to look at UnifiedHighlighter for help. > >> > > >> > Seems like the right way (?) to do it would be to override (and abuse) > >> > UnifiedHighlighter's getFieldHighlighter method and return a field > >> > highlighter with an override of: > >> > > >> > protected Passage[] highlightOffsetsEnums(List<OffsetsEnum> > >> > offsetsEnums) throws IOException { > >> > > >> > so that I can capture and return a separate Passage for each > >> > OffsetsEnum (I have my own code to deal with overlaps and merging, so > >> > I can skip this entirely). Then, with a custom no-op PassageFormatter > >> > I could simply get a list of those offsets. > >> > > >> > The problem with this approach is that there is currently no way to > >> > access offsets in OffsetsEnum -- everything is protected (so > >> > subclassable), but OffsetsEnum are closed to package-private scope. > >> > Namely these two: > >> > > >> > int startOffset() throws IOException { > >> > return postingsEnum.startOffset(); > >> > } > >> > > >> > int endOffset() throws IOException { > >> > return postingsEnum.endOffset(); > >> > } > >> > > >> > Should these two be protected to allow such customizations (I agree > >> > it's *very* low-level, but I have a practical use case where this > >> > would be useful). > >> > > >> > Am I on the right track here? > >> > > >> > Separately from that, I think it'd be nice to have some sort of > >> > generic utility that, for a given document (or a set of documents) > >> > would return such hit ranges... UnifiedHighlighter seems > >> > > >> > Dawid > > > > -- > > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > > http://www.solrenterprisesearchserver.com > > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
