lgo opened a new issue #6126:
URL: https://github.com/apache/incubator-pinot/issues/6126


   While using the Spark segment generation, we've had a configuration with 
configurations such as a `sortedColumn`. Initially it was assumed that the data 
will be sorted by the segment generation jobs and we missed that it needed to 
be sorted upstream.
   
   This is documented on the sort index section 
(https://docs.pinot.apache.org/basics/indexing/forward-index)
   > For offline push, input data needs to be sorted before running Pinot 
segment conversion and push job.
   
   Additioanlly, it's unclear if the same will happen if users specify a 
partition scheme on a table but do not correctly partition the input data. 
(Searching "partition" on the docs yielded no mentions about this).
   
   This is an easy thing to miss, and while pre-processing jobs especially help 
(https://github.com/apache/incubator-pinot/issues/4353) it would be good to 
prevent the mistake in the first place with invariants and actionable errors.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to