npawar opened a new pull request #5296: URL: https://github.com/apache/incubator-pinot/pull/5296
We have `RecordReader` for batch and `StreamMessageDecoder` for stream. RecordReader has `GenericRecord next()` , where it operates on a record from a file. Decoder has `GenericRecord decode(byte[] payload)` where it operates on a record from a stream. Both of them use the `RecordExtractor` inside them (recent change: https://github.com/apache/incubator-pinot/pull/5238) . ``` while (has next record) { GenericRow nextRecord = recordReader.next() // reads and extracts inside the record reader/decoder GenericRow transformedRow = recordTransformer.transform(nextRecord); } ``` Similar for realtime. Decoder -> Extractor -> Transformer ``` for (message : messageBatch) { Object decodedRow = decoder.decode(message) GenericRow extractedRow = recordExtractor.extract(decodedRow, reuse); GenericRow transformedRow = recordTransformer.transform(extractedRow); } ``` Now we want to take the decoupling one step further i.e. pull out the RecordExtractor from the RecordReader/Decoder. so the flow will be **RecordReader -> RecordExtractor -> RecordTransformer** ``` while (recordReader.hasNext) { Object decodedRow = recordReader.next(); GenericRow extractedRow = recordExtractor.extract(decodedRow, reuse); GenericRow transformedRow = recordTransformer.transform(extractedRow); } ``` The motivations of doing this are 1. Decouple Reader and Extractor, as they make sense as sequential steps, and not one embedded within the other 2. Eventually, we want to remove RecordReader's dependency on Schema. This change will take us one step closer to that goal. 3. All these steps will make it easier to add other connectors. 4. We want to move `ExpressionEvaluators` and `SchemaFieldExtractor` to pinot-core, so that we can make use of pinot-core's expression parsing code, to parse inbuilt pinot functions. Currently they are all in pinot-spi, because they are needed by the RecordReaders (which are in pinot-plugins. plugins cannot access core) **NOTE: StreamMessageDecoder and RecordReader interface have been changed.** ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
