ksmatharoo commented on issue #1617:
URL: https://github.com/apache/iceberg/issues/1617#issuecomment-1515953429
> I don't think it is a good idea in general to use relative paths. We
recently had an issue where using a `hdfs` location without authority caused a
user's data to be deleted by the `RemoveOrphanFiles` action because the
resolution of the table root changed. The main problem is that places in
Iceberg would need to have some idea of "equivalent" paths and path resolution.
Full URIs are much easier to work with and more reliable.
>
> But there is still a way to do both. Catalogs and tables can inject their
own `FileIO` implementation, which is what is used to open files. That can do
any resolution that you want based on environment. So you could use an
implementation that allows you to override a portion of the file URI and read
it from a different underlying location. I think that works better overall
because there are no mistakes about equivalent URIs, but you can still read a
table copy without rewriting the metadata.
@rdblue We tried injecting own FileIO which will replace the table/metadata
path prefix with new location, this works till reading table metadata but while
reading parquet files it struck in BatchDataReader.java in following function,
Please provide your thoughts on this if there is some other way of achieving
this.
protected CloseableIterator<ColumnarBatch> open(FileScanTask task) {
String filePath = task.file().path().toString();
LOG.debug("Opening data file {}", filePath);
// update the current file for Spark's filename() function
InputFileBlockHolder.set(filePath, task.start(), task.length());
Map<Integer, ?> idToConstant = constantsMap(task, expectedSchema());
/*** below given code line is causing issue because its searching name
given in metadata in the map which will be replaced by custom FileIO, changing
this line to InputFile inputFile = table.io().newInputFile(filePath); is making
it work but this remove the encryption logic, in short we couldn't make it work
with only Custom FileIO
**/
InputFile inputFile = getInputFile(filePath);
Preconditions.checkNotNull(inputFile, "Could not find InputFile
associated with FileScanTask");
SparkDeleteFilter deleteFilter =
task.deletes().isEmpty()
? null
: new SparkDeleteFilter(filePath, task.deletes(), counter());
return newBatchIterable(
inputFile,
task.file().format(),
task.start(),
task.length(),
task.residual(),
idToConstant,
deleteFilter)
.iterator();
}
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]