lksvenoy-r7 opened a new issue, #8889:
URL: https://github.com/apache/pinot/issues/8889

   The current pinot flink connector does not gracefully handle errors. Due to 
the way the connector works, if it errors in the middle of adding segments to a 
table, the table ends up with an inconsistent view. Additionally, the connector 
does not currently support refresh tables. Refresh tables require atomic 
segment replacement, but the connector currently naively uploads segments as 
they are built.
   
   From testing the connector in production, I've also identified a few 
performance issues. These have a few different causes; The AVRO serialization 
is not configurable, nor is the file writing configurable (for example for 
different block sizes).
   
   I have written a flink connector based on this one, but with some heavy 
amendments. First of all, it implements WithPostCommitTopology<GenericRecord, 
PinotSinkCommittable> from flink, implementing a global committer. It does work 
in a few different stages:
   
   1. Operator is responsible for sending serialized AVRO records directly to 
the sink
   2. The sink writer is responsible for building and flushing segments to disk
   3. The sink committer (before global commit) is responsible for uploading 
the segments to a location that is reachable by all nodes in the flink cluster 
(In my case, to S3 deep store)
   4. The global sink committer executes the segment replacement protocol 
defined in the Pinot SDK.
   
   This sink is currently only compatible with REFRESH type tables that want to 
replace all segments on every single job execution. It takes care of atomically 
replacing the segments for the table, and performs well due to the way it does 
the hard work upfront. I am open to sharing this code so that it can be merged 
into the pinot repository, but it does have some limitations.
   
   - No checkpointing
   - Only BATCH execution mode is supported at the moment
   - Only REFRESH tables are supported at the moment (Full segment replacement)
   - The connector currently bypasses certain Pinot conventions (such as using 
certain attributes defined in the batch config, and so on). This would need to 
be approached with scrutiny to ensure the code is in-line with the rest of the 
repository.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to