jamesyfshao opened a new issue #4261: implement upsert support on pinot
URL: https://github.com/apache/incubator-pinot/issues/4261
 
 
   Pinot is a distributed real-time OLAP engine that can provide second-level 
data freshness by ingesting kafka events and capacity to manage months of 
historical data load from various data sources such as HDFS, schemaless, etc. 
However, Pinot right now mostly functions as an append-only storage system. It 
doesn’t allow modify/delete of existing records with the exception of 
overriding all data within a time range with offline tables. This limits the 
applicability of pinot system due to a lot of use-cases requiring updates to 
their data due to the nature of their events or needs for data 
correction/backfill. In order to extend the capacity of pinot and serve more 
use-cases, we are going to implement the upsert features in Pinot which allows 
users to update existing records in Pinot tables with its kafka data input 
stream.
   
   Some initial requirements for the upsert projects:
   
   1. Only support for full update to pinot event
   
   2. Only support for Kafka-compatible queue ingestion model
   
   3. Single pinot server/table can handle 10k/sec ingestion message rate
   
   4. Each pinot server can handle 1 Billion records or 1TB storage
   
   5. Ingestion latency overhead compared to non-upsert model < 1min
   
   6. Query latency overhead compared to non-upsert model < 50%
   
   7. Data retention < 2 weeks
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to