cmathiesen commented on a change in pull request #678: Add Java code examples 
and update site docs
URL: https://github.com/apache/incubator-iceberg/pull/678#discussion_r362735876
 
 

 ##########
 File path: site/docs/evolution.md
 ##########
 @@ -54,6 +54,11 @@ Iceberg uses unique IDs to track each column in a table. 
When you add a column,
 
 Iceberg table partitioning can be updated in an existing table because queries 
do not reference partition values directly.
 
+When you evolve a partition spec, the old data written with an earlier spec 
remains unchanged. New data is written using the new spec in a new layout. 
Metadata for each of the partition versions is kept separately. Because of 
this, when you start writing queries, you get split planning. This is where 
each partition layout plans files separately using the filter it derives for 
that specific partition layout. Here's a visual representation of a contrived 
example: 
+
+![Partition evolution diagram](img/partition-evolution-diagram.png)
 
 Review comment:
   I see what you mean regarding the first point. This diagram was drawn up 
when we were trying to understand how exactly partition evolution would work 
with queries (I think we emailed you a while back and you explained the split 
planning approach to us which led to this diagram). Would it be worth refining 
this diagram a little so it can be a visual example of the split planning 
concept? I'll have a think about another diagram that might fit better here too 
:)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to