kbendick commented on PR #5516:
URL: https://github.com/apache/iceberg/pull/5516#issuecomment-1216075644

   > While there are definitely caveats like renaming column name case and 
presence of V2 delete files that we should warn about, I also agree with 
@jackye1995 , generate SymlinkManifest seems quite useful, we have seen some 
asks for interop of data in iceberg table to non-Iceberg systems (in house 
file-based tools for instance).
   > 
   > 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 seems a pretty standard way to provide file listing, and both 
generating/reading seems supported in a variety of data warehouse systems.
   
   I strongly share @rdblue’s concerns about properly reading deletes and can 
see potential concerns about schema evolution, but I do have seen that symlink 
text input format is pretty useful for interop and _usually, in my experience_ 
it’s (re)loaded every time it’s needed. But that might not be the common case. 
Does it have to be rebuilt entirely every time?
   
   I think it has a lot of value for interop with specific BI tools that are 
used for presentations etc that the end user really doesn’t have a choice on.
   
   In my experience, I’ve seen this needed especially for specific one off 
things like financial report demonstrations where the BI tool is for some 
reason set in stone.
   
   I’m not familiar enough with the format and concerns with schema evolution — 
again, usually my usage of it has been one off or rebuild / reload on every 
“refresh” — but I do think this would be good for Iceberg overall.
   
   But ensuring schema and properly applied deletes on every creation should be 
done with a lot of care.
   
   At least for BI tools that wouldn’t support a custom format at all (as this 
is much less efficient than reading the Iceberg table directly). I can share 
concern with not discouraging other tools from skipping on supporting Iceberg 
natively. But maybe this will provide those tools more of an incentive in the 
long run.
   
   My 2 cents on the matter. 100% happy to learn more about the format’s 
specifics to help support it if need be. 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to