gianm commented on issue #5584: Decoupling FirehoseFactory and InputRowParser
URL: 
https://github.com/apache/incubator-druid/issues/5584#issuecomment-410045743
 
 
   Yeah -- StringInputRowParser is a bit of a special snowflake. Maybe doing 
this feature is the motivation we need to rethink how that all works. To add to 
the points you raised, there's also this gross code in TransformSpec (which 
decorates InputRowParsers with expression-based transforms):
   
   ```
       if (parser instanceof StringInputRowParser) {
         // Hack to support the fact that some callers use special methods in 
StringInputRowParser, such as
         // parse(String) and startFileFromBeginning.
         return (InputRowParser<T>) new TransformingStringInputRowParser(
             parser.getParseSpec(),
             ((StringInputRowParser) parser).getEncoding(),
             this
         );
       } else {
         return new TransformingInputRowParser<>(parser, this);
       }
   ```
   
   And there's also the warty fact that ParseSpec's `makeParser()` method is 
actually only used by StringInputRowParser (and a couple of extension parsers 
that work by converting inputs to JSON and then parsing them as strings). Other 
parsers call `getTimestampSpec` and `getDimensionsSpec`, but they don't call 
`makeParser`. Some parsers, however, can extract useful things from certain 
types of ParseSpecs. The way this is done is weird, leading to code like this 
in the Avro extension to yank out an optional FlattenSpec:
   
   ```
       final JSONPathSpec flattenSpec;
       if (parseSpec != null && (parseSpec instanceof AvroParseSpec)) {
         flattenSpec = ((AvroParseSpec) parseSpec).getFlattenSpec();
       } else {
         flattenSpec = JSONPathSpec.DEFAULT;
       }
   ```
   
   The only reason StringInputRowParser doesn't have to do something like this 
is that the method _it_ needs (makeParser) has been granted special status as a 
member of the ParseSpec interface (even though many ParseSpecs don't implement 
it, they just throw exceptions).
   
   If we are wanting to be ambitious (I don't see why not!) then I think we can 
make nicer, clearer interfaces by rethinking how this all works. In a backwards 
compatible way, since we don't want all of our users to have to rewrite their 
ingestion specs (which will probably be the tricky part).
   
   @josephglanville what ambition level are you interested in having here? 😄

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to