Thanks for the heads up on this. It sounds like this is not a concern for most people, but we should definitely add it to our maintenance docs to call it out in a warning. Would you like to open a PR for that?
On Fri, Sep 11, 2020 at 3:45 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > Because the RemoveOrphanFilesAction uses Filesystem.list, the paths of > files found in the file system can have an authority included in them based > on the core-site.xml. This is determined > when listing the files so the entries stored in the metadata tables do not > necessarily have to match. URIs will have the same scheme and path but can > have a potentially > different authority. This means when doing a string matching join in > Spark, the files found on the system will not match those listed in the > metadata table and the > action will determine that the files are no longer required and delete > them. This leads to removing all the files that are listed with a different > authority. > > This will only affect you if you have changed authorities between writing > and running RemoveOrphanFilesAction I believe. > We are doing more investigation but because of the potential for data loss > I thought it important to share with the dev-list. > > If your authority has not changed, or will not change there should be no > issues. > > Thanks for your time, > Russ > -- Ryan Blue Software Engineer Netflix