npawar commented on issue #5238: Evaluate schema transform expressions during ingestion URL: https://github.com/apache/incubator-pinot/pull/5238#issuecomment-613199104 > > > Will be nice if we can completely decouple schema from RecordReader and RecordExtractor, which will make future development much easier > > > > > > It's already decoupled from RecordExtractor. Do you mean pull up the schema even more, such that RecordReader also doesn't need Schema? What would we achieve by doing that? > > @npawar RecordExtractor is for stream ingestion, and RecordReader is for batch ingestion. Think of some users trying to add a new record reader, they don't need to understand what schema is, they only need to know here are the fields that should be read. > This might be bigger change, so we can add a TODO and address it separately. As per offline sync up, StreamMessageDecoder is the entry point for realtime, and RecordReader is the entry point for batch. The RecordExtractor is expected to be common to both of them. Picture in the design doc linked in the description. And I've added a TODO in RecordReader class to further pull out Schema. For consistency, we should do the same in StreamMessageDecoder as well then. Since this is a bigger change, will leave it out for the scope of this PR.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
