Hello everyone,
I would like your thoughts on something about the HDFS reads.So far
when you submitted a query, with the collection or document function,
the system would first check your local file system and if the path
exists it will run as normally, but if it does not exists on local, it
would read from HDFS.That could cause an issue when we have the same
paths on both local and HDFS.
We thought 2 ways around, one is the user will include in the path
a header 'file://' for local and "hdfs://" for HDFS or we could add
another argument that would be something like --filesystem='hdfs' for hdfs.
The first one is simpler but you cannot use relative paths,the second
one just adds another argument to the cli.
Which do you think would be better for us?
Thank you,
Efi