jccampagne commented on PR #2207:
URL: https://github.com/apache/arrow-rs/pull/2207#issuecomment-1200300249

   >tustvold:
   >        // In tree symlink gets resolved and deduplicated
   >        // In tree symlink gets resolved and canonicalised to actual path
   > Symlinks that are within the LocalFileSystem root are resolved to paths, 
even if these then aren't prefixes of the search path
   
   
   What's the rationale behind resolving symlinks to actual path, and 
deduplicating?
   
   
   > alamb: I need to think more deeply about what not supporting symlinks 
would mean. I do feel like symlinks are used for many different things locally 
so simply ignoring them seems less than ideal
   
   
   A use case for symlinks could be a way to organise large data files (eg 
parquet): instead of moving or copying large datasets, one could organise the 
datasets by using links under different directories.
   
   eg:
   
   The directory `/data/everything/` is a repository containing many (large) 
files:
   
   ```
   /data/everything/a.parquet
   …
   /data/everything/z.parquet
   ```
   
   One would create several subsets:
   
   ```
   /data/client1/a.parquet -> /data/everything/a.parquet
   /data/client1/b.parquet -> /data/everything/b.parquet
   
   
   /data/client2/d.parquet -> /data/everything/d.parquet
   /data/client2/e.parquet -> /data/everything/e.parquet
   /data/client2/f.parquet -> /data/everything/f.parquet
   
   /data/client3/x.parquet -> /data/everything/w.parquet
   /data/client3/y.parquet -> /data/everything/w.parquet  # yes, same file, why 
not?
   
   ```
   
   
   And we would create a Filestore for each case/client (pseudo code):
   
   ```
   LocalFileSystem::new_with_prefix("/data/client1");
   LocalFileSystem::new_with_prefix("/data/client2");
   LocalFileSystem::new_with_prefix("/data/client3");
   ```
   
   
   And they would see the files as unresolved links: `/data/client*/*.parquet` 
instead of `/data/everything/*.parquet`.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to