[ 
https://issues.apache.org/jira/browse/ORC-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978633#comment-16978633
 ] 

Gopal Vijayaraghavan commented on ORC-570:
------------------------------------------

bq. What am I missing?

The SSD cache - the FS object is never used to read date_dim table (75k rows= 
~30ms), but the FS init dance takes approximately 600ms (multiple SSL 
connections to S3 for HEAD for '/' to check bucket is valid, DynamoDb init for 
S3guard and  very soon another hop to get Idbroker role information) & 
overshadows the latencies.

Since the output broadcast is also going out over the SSDs, this isn't really a 
good enough reason to init the FS (and below see why we didn't want ORC to do 
the fs.get()).

bq.  It already postpones creating the FileSystem until it needs it.

That was creating an unbound FS object, so we wanted to regain control of the 
FS used to close it up & fix leaks (there's probably a rant about UGI  + 
threading here).

So we started to pass it in via the options, which force an early creation to 
do the .fileSystem(FileSystem) (not the get).

Because the ReaderOptions class isn't final, we could override it with a 
subclass named EncodedReaderOptions, but the DataReaderOptions couldn't be 
overridden.

See HIVE-22499 
https://issues.apache.org/jira/secure/attachment/12985914/HIVE-22499.WIP.patch 
for the original attempt to fix this without touching the API directly.

However, that approach failed for the DataReaderOptions due to its builder 
private ctor pattern making it impossible to override without an ABI change 
anyway.

> FS: ReaderOptions.filesystem should also accept a lazy Supplier 
> ----------------------------------------------------------------
>
>                 Key: ORC-570
>                 URL: https://issues.apache.org/jira/browse/ORC-570
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Gopal Vijayaraghavan
>            Assignee: Mustafa Iman
>            Priority: Major
>         Attachments: ORC-570.WIP.patch, ORC-570.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> FileSystem initialization is not always necessary when ReaderOptions are 
> built out for files, particularly if an OrcTail is provided for the reader 
> from another process for the reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to