[
https://issues.apache.org/jira/browse/ORC-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978633#comment-16978633
]
Gopal Vijayaraghavan commented on ORC-570:
------------------------------------------
bq. What am I missing?
The SSD cache - the FS object is never used to read date_dim table (75k rows=
~30ms), but the FS init dance takes approximately 600ms (multiple SSL
connections to S3 for HEAD for '/' to check bucket is valid, DynamoDb init for
S3guard and very soon another hop to get Idbroker role information) &
overshadows the latencies.
Since the output broadcast is also going out over the SSDs, this isn't really a
good enough reason to init the FS (and below see why we didn't want ORC to do
the fs.get()).
bq. It already postpones creating the FileSystem until it needs it.
That was creating an unbound FS object, so we wanted to regain control of the
FS used to close it up & fix leaks (there's probably a rant about UGI +
threading here).
So we started to pass it in via the options, which force an early creation to
do the .fileSystem(FileSystem) (not the get).
Because the ReaderOptions class isn't final, we could override it with a
subclass named EncodedReaderOptions, but the DataReaderOptions couldn't be
overridden.
See HIVE-22499
https://issues.apache.org/jira/secure/attachment/12985914/HIVE-22499.WIP.patch
for the original attempt to fix this without touching the API directly.
However, that approach failed for the DataReaderOptions due to its builder
private ctor pattern making it impossible to override without an ABI change
anyway.
> FS: ReaderOptions.filesystem should also accept a lazy Supplier
> ----------------------------------------------------------------
>
> Key: ORC-570
> URL: https://issues.apache.org/jira/browse/ORC-570
> Project: ORC
> Issue Type: Bug
> Reporter: Gopal Vijayaraghavan
> Assignee: Mustafa Iman
> Priority: Major
> Attachments: ORC-570.WIP.patch, ORC-570.patch
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> FileSystem initialization is not always necessary when ReaderOptions are
> built out for files, particularly if an OrcTail is provided for the reader
> from another process for the reader.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)