haibow commented on issue #4225: Make Pinot schema evolution easier
URL: 
https://github.com/apache/incubator-pinot/issues/4225#issuecomment-521824275
 
 
   Seems reload doesn't work well with REALTIME tables, and the table will end 
up having segments with inconsistent schemas.
   
   For realtime tables, after updating the schema and calling `reload` 
endpoint, all ONLINE segments would be reloaded with the new schema, but the 
CONSUMING segment would be skipped. As a result, the consuming segment would 
both keep consuming and finally seal with the old schema.
   
   Tested on a realtime table with LLC (code last checked in from master on 
[04/11/19](https://github.com/apache/incubator-pinot/commit/26330f3a2c3309e7cf574e1fff86a1de9fb934ff)).
 The consuming segment at the time of the reloading would be the only segment 
with the old schema, when either in CONSUMING state or later in ONLINE state. 
   
   Impact:
   - when querying data within a small time range after the time of reloading, 
the new field added in the new schema is not returned in the query result.
   - when querying data with a bit bigger time range, we would see messages 
below:
   `MergeResponseError: responses for table: $table from servers: [$server1, 
server2] got dropped due to data schema inconsistency.`
   
   Reloading the table/segment again after the consuming segment seals would 
reload it with the new schema thus bringing the whole table back in healthy 
state, but it's operationally inefficient.
    
   So it seems more like a bug now. We might need to revisit approaches like 
   - flushing the consuming segment (and reload)
   - adding new columns in memory, and refresh the schema, before consuming 
more rows (without a forced flush)
   
   @mayankshriv @Jackie-Jiang thoughts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to