npawar opened a new pull request #5296:
URL: https://github.com/apache/incubator-pinot/pull/5296


   We have `RecordReader` for batch and `StreamMessageDecoder` for stream.
   RecordReader has `GenericRecord next()` , where it operates on a record from 
a file. Decoder has `GenericRecord decode(byte[] payload)` where it operates on 
a record from a stream.
   Both of them use the `RecordExtractor` inside them (recent change: 
https://github.com/apache/incubator-pinot/pull/5238) .
   ```
   while (has next record) {
     GenericRow nextRecord = recordReader.next() // reads and extracts inside 
the record reader/decoder
     GenericRow transformedRow = recordTransformer.transform(nextRecord);
   }
   ```
   Similar for realtime. Decoder -> Extractor -> Transformer
   ```
   for (message :  messageBatch) {
     Object decodedRow = decoder.decode(message)
     GenericRow extractedRow = recordExtractor.extract(decodedRow, reuse);
     GenericRow transformedRow = recordTransformer.transform(extractedRow);
   }
   ```
   
   Now we want to take the decoupling one step further i.e. pull out the 
RecordExtractor from the RecordReader/Decoder.
   so the flow will be **RecordReader -> RecordExtractor -> RecordTransformer**
   ```
   while (recordReader.hasNext) {
     Object decodedRow = recordReader.next();
     GenericRow extractedRow = recordExtractor.extract(decodedRow, reuse);
     GenericRow transformedRow = recordTransformer.transform(extractedRow);
   }
   ```
   
   The motivations of doing this are
   1. Decouple Reader and Extractor, as they make sense as sequential steps, 
and not one embedded within the other
   2. Eventually, we want to remove RecordReader's dependency on Schema. This 
change will take us one step closer to that goal.
   3. All these steps will make it easier to add other connectors.
   4. We want to move `ExpressionEvaluators` and `SchemaFieldExtractor` to 
pinot-core, so that we can make use of pinot-core's expression parsing code, to 
parse inbuilt pinot functions. Currently they are all in pinot-spi, because 
they are needed by the RecordReaders (which are in pinot-plugins. plugins 
cannot access core)
   
   **NOTE: StreamMessageDecoder and RecordReader interface have been changed.**


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to