Erick:
(sorry, I misspelled your name in my last email )
I tried a bunch of solutions.... none worked as I expected.
Basically none of them sorts the documents using the pattern as I expect.
This is my simplified code:
public class PatternFieldComparatorSource
extends FieldComparatorSource {
private String pattern;
private boolean ascending = false;
public PatternFieldComparatorSource(String pattern, boolean ascending){
this.ascending = ascending;
this.pattern = pattern;
}
public FieldComparator newComparator(String fieldname, int numHits, int
sortPos, boolean reversed) throws IOException {
return new PatternFieldComparator(numHits, fieldname);
}
class PatternFieldComparator extends FieldComparator {
private final int[] values;
private int[] currentReaderValues;
private final String field;
private int bottom; // Value of bottom of
queue
HighTrafficFieldComparator(int numHits, String field) {
values = new int[numHits];
this.field = field;
}
public int compare(int slot1, int slot2) {
// TODO: there are sneaky non-branch ways to compute
// -1/+1/0 sign
// Cannot return values[slot1] - values[slot2] because that
// may overflow
final int v1 = values[slot1];
final int v2 = values[slot2];
if (v1 > v2) {
return 1;
} else if (v1 < v2) {
return -1;
} else {
return 0;
}
}
public int compareBottom(int doc) {
// TODO: there are sneaky non-branch ways to compute
// -1/+1/0 sign
// Cannot return bottom - values[slot2] because that
// may overflow
final int v2 = currentReaderValues[doc];
if (bottom > v2) {
return 1;
} else if (bottom < v2) {
return -1;
} else {
return 0;
}
}
public void copy(int slot, int doc) {
values[slot] = currentReaderValues[doc];
}
public void setNextReader(IndexReader reader, int docBase) throws
IOException {
currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
FieldCache.IntParser() {
public final int parseInt(final String val) {
return getValueByPattern(val);
}
});
}
public void setBottom(final int bottom) {
this.bottom = values[bottom];
}
public Comparable value(int slot) {
return values[slot];
}
}
private Integer getValueByPattern(String text) {
// if pattern is not present I return then max or min value possible
(depends if sort is ascending or descending).
int value = !ascending ? Integer.MAX_VALUE : Integer.MIN_VALUE;
// if pattern is pressent...
if (text.contains(pattern)
{
value = Integer.parseInt(...) // extract the value and return
}
return new Integer(value);
}
}
My code does not sort fine. I'm not finding a explanation why.
Thanks
Víctor
On Sat, Jan 17, 2015 at 9:04 PM, Erick Erickson <[email protected]>
wrote:
> Ah, OK. H.L. Mencken wrote something like:
> "For every complex problem there is a solution
> that is simple, elegant, and wrong". I specialize in these...
>
> I don't have a good answer for your question then. How
> is what you're trying failing?
>
> Best,
> Erick
>
> On Fri, Jan 16, 2015 at 4:59 PM, Victor Podberezski
> <[email protected]> wrote:
> > Erik, Thanks for your reply.
> >
> > I wrote a simplification of the problem. Not only the values in the field
> > that can be sorted are "val1, val2,..." . they can also be "patternX1,
> > patternX2", etc.
> >
> > and in that case I need to sort according to different criteria. They're
> a
> > lot of differents patterns but not to much documents as result of the
> query
> > filter
> > For that reason I think the best way is a custom FieldComparator.
> >
> > Thanks
> > Víctor Podberezski
> >
> > On Fri, Jan 16, 2015 at 9:31 PM, Erick Erickson <[email protected]
> >
> > wrote:
> >
> >> Personally I would do this on the ingestion side with a new field.
> >> That is, analyze the input field when you were indexing the doc,
> >> extract the min value from any numbers, and put that in a
> >> new field. Then it's simply sorting by the new field. This is likely
> >> to be much more performant than reprocessing this at query
> >> time in a comparator.
> >>
> >> FWIW,
> >> Erick
> >>
> >> On Fri, Jan 16, 2015 at 4:00 PM, Victor Podberezski
> >> <[email protected]> wrote:
> >> > I need a hand with a custom comparator.
> >> >
> >> > I have a field filled with words separated by spaces. Some words has
> >> > numbers inside.
> >> >
> >> > I need to extract those numbers and sort the documents by this
> number. I
> >> > need to get the lower if there are more than 1 number .
> >> >
> >> > For example:
> >> >
> >> > doc1 "val2 aaaa val3" --> 2, 3 --> 2
> >> > doc2 "val5 aaaa val1" --> 5, 1 --> 1
> >> > doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
> >> >
> >> > the sorted results have to be:
> >> >
> >> > doc2
> >> > doc1
> >> > doc3
> >> >
> >> > how can I achieve this?
> >> >
> >> > I have trouble migrating a functional solution from lucene 2.4 to
> lucene
> >> > 3.9 or higher (migration from ScoreDocComparator to fieldComparator).
> >> >
> >> > I try this:
> >> >
> >> > public void setNextReader(IndexReader reader, int docBase) throws
> >> > IOException {
> >> >
> >> > currentReaderValues = FieldCache.DEFAULT.getInts(reader, field,
> new
> >> > FieldCache.IntParser() {
> >> > public final int parseInt(final String val) {
> >> > return extractNumber(val);
> >> > }
> >> > });
> >> >
> >> > and the rest equal to the IntComparator.
> >> > but this is not working
> >> >
> >> > Anybody has an idea of how resolve this problem?
> >> > Thanks,
> >> >
> >> > Víctor Podberezski
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>