[jira] [Commented] (SOLR-4892) Add field update processors to parse/convert String-typed fields to Date, Number, and Boolean

Hoss Man (JIRA) Tue, 25 Jun 2013 10:35:27 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693188#comment-13693188
 ]


Hoss Man commented on SOLR-4892:
--------------------------------


Steve: for the most part the patch looks good, but a couple of things jumped 
out at me though as seeming kind of fishy...

1) AllValuesOrNoneFieldMutatingUpdateProcessor semantics

The "if (destVal == srcVal) ... do nothing" logic seems like a trap for future 
devs who want to subclass -- not to mention i'm not 100% certain i agree it 
makes sense for the subclasses you've already written.  (If I send the list 
[new String("6.0"), new Double(7.3D), new String("4.3")] to 
ParseDOubleUpdateProcessor should it really reject the entire list because one 
of the original values is already a double?)

I think it would be more straight forward to give subclasses another singleton 
marker object (similar to the DELETE singleton) they can return from 
mutateValue() to say "abort processing this list i have encountered an object i 
do not want".  If mutateValue() returns the same object it was passed in, that 
should be considered a successful mutation.


2) ParsingFieldUpdateProcessor overriding processAdd

Some code duplication here that worries me.  If we want to make the change I 
suggested above for parsing strings in lists where we can and allowing type 
specific objects if they are already the correct type, then (if i'm thinking 
about this correctly) we don't need this class at all, classes like 
ParseFooFieldUpdateProcessor could easily just handle the CharSequence check as 
part of their basic logic...

{code}
  if (val instanceof CharSequence) {
     val = ... attempt parse as a Foo...;
  }
  if (val instanceof Foo) return val;
  return SKIP_FIELD_VALUE_LIST_SINGLETON
{code}

But even if we want to keep the current semantics of 
AllValuesOrNoneFieldMutatingUpdateProcessor, then i think it would be cleaner 
if ParsingFieldUpdateProcessor was changed to be really simple, ala...

{code}
ParsingFieldUpdateProcessor extends AllValuesOrNoneFieldMutatingUpdateProcessor 
{
  public ParsingFieldUpdateProcessor(FieldNameSelector selector, 
UpdateRequestProcessor next) {
    super(selector, next);
  }
  protected final Object mutateValue(final Object srcVal) {
    return (srcVal instanceof CharSequence) ? 
      parse(srcVal.toString()) : SKIP_FIELD_VALUE_LIST_SINGLETON;
  }
  protected abstract Object parse(final Object srcVal);
}
{code}

...and then the concrete subclasses just implement "parse"

3) either way we go with #2, that means we don't need the access modifieier 
change on 'selector' in FieldMutatingUpdateProcessor.

4) double check the handling of field boosts in 
AllValuesOrNoneFieldMutatingUpdateProcessor ... pretty sure it isn't being 
preserved (see FieldValueMutatingUpdateProcessor for example)

5) The SortableFooField FieldType's should also be decorated with your new 
interfaces.

6) Can we refactor the locale init param parsing into a helper utility class or 
a common factory bsae class so it's not duplicated in both 
ParseNumericFieldUpdateProcessorFactory and 
ParseDateFieldUpdateProcessorFactory ?

----

And some misc questions...

7) Why no ParseIntFieldUpdateProcessor and ParseFloatFieldUpdateProcessor ?

8) would it be simpler for users to have a single "locale" init param that 
accepted things like "ru" and "ru_RU" and "ru_RU_xxx" using 
LocaleUtils.toLocale in commons-lang?

9) we should be able to make the ParseFooFieldUpdateProcessor inner classses 
into static inner classes just by passing hte locale into the constructor and 
right?

                
> Add field update processors to parse/convert String-typed fields to Date, 
> Number, and Boolean
> ---------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4892
>                 URL: https://issues.apache.org/jira/browse/SOLR-4892
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>            Priority: Minor
>             Fix For: 4.4
>
>         Attachments: SOLR-4892.patch
>
>
> Add {{FieldMutatingUpdateProcessorFactory}} subclasses 
> {{ParseFooUpdateProcessorFactory}}, where {{Foo}} includes {{Date}}, 
> {{Double}}, {{Long}}, and {{Boolean}}.
> These factories will have a default selector that matches all fields that 
> either don’t match any schema field, or are in the schema with the 
> corresponding {{typeClass}}.  They can optionally support a list of multiple 
> format specifiers.  If they see a value that is not a CharSequence, or can't 
> parse the value using a configured format, they ignore it and leave it as is.
> For multi-valued fields, these processors will not convert any values unless 
> all are first successfully parsed.  Ordering the processors [Boolean, Long, 
> Double, Date] will allow e.g. values [2, 5, 8.6] to be left alone by the 
> Boolean and Long processors, but then converted by the Double processor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4892) Add field update processors to parse/convert String-typed fields to Date, Number, and Boolean

Reply via email to