jccampagne commented on PR #2207:
URL: https://github.com/apache/arrow-rs/pull/2207#issuecomment-1200300249
>tustvold:
> // In tree symlink gets resolved and deduplicated
> // In tree symlink gets resolved and canonicalised to actual path
> Symlinks that are within the LocalFileSystem root are resolved to paths,
even if these then aren't prefixes of the search path
What's the rationale behind resolving symlinks to actual path, and
deduplicating?
> alamb: I need to think more deeply about what not supporting symlinks
would mean. I do feel like symlinks are used for many different things locally
so simply ignoring them seems less than ideal
A use case for symlinks could be a way to organise large data files (eg
parquet): instead of moving or copying large datasets, one could organise the
datasets by using links under different directories.
eg:
The directory `/data/everything/` is a repository containing many (large)
files:
```
/data/everything/a.parquet
…
/data/everything/z.parquet
```
One would create several subsets:
```
/data/client1/a.parquet -> /data/everything/a.parquet
/data/client1/b.parquet -> /data/everything/b.parquet
/data/client2/d.parquet -> /data/everything/d.parquet
/data/client2/e.parquet -> /data/everything/e.parquet
/data/client2/f.parquet -> /data/everything/f.parquet
/data/client3/x.parquet -> /data/everything/w.parquet
/data/client3/y.parquet -> /data/everything/w.parquet # yes, same file, why
not?
```
And we would create a Filestore for each case/client (pseudo code):
```
LocalFileSystem::new_with_prefix("/data/client1");
LocalFileSystem::new_with_prefix("/data/client2");
LocalFileSystem::new_with_prefix("/data/client3");
```
And they would see the files as unresolved links: `/data/client*/*.parquet`
instead of `/data/everything/*.parquet`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]