alamb opened a new issue, #9133: URL: https://github.com/apache/arrow-datafusion/issues/9133
### Is your feature request related to a problem or challenge? After https://github.com/apache/arrow-datafusion/pull/8753 it is now possible to read data from `http` via a create external table command: ```sql ❯ create external table hits stored as parquet location 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'; 0 rows in set. Query took 0.178 seconds. ❯ describe hits; +-----------------------+-----------+-------------+ | column_name | data_type | is_nullable | +-----------------------+-----------+-------------+ | WatchID | Int64 | YES | | JavaEnable | Int16 | YES | | Title | Binary | YES | ... | RefererHash | Int64 | YES | | URLHash | Int64 | YES | | CLID | Int32 | YES | +-----------------------+-----------+-------------+ 105 rows in set. Query took 0.003 seconds. ``` After https://github.com/apache/arrow-datafusion/pull/9064 from @manoj-inukolunu it is possible to `COPY` **to** a remote url which is also great. However, it is not yet possible to select directly from a remote store like ```sql select * from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'; ``` ### Describe the solution you'd like I would like to be able to select directly from a remote http source like ```sql select * from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet' limit 1; Error during planning: table 'datafusion.public.https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet' not found ``` This works great for local files: ```sql ❯ select * from '/Users/andrewlamb/Downloads/hits.parquet' limit 1; +---------------------+------------+---------------+-----------+------------+-----------+-----------+------------+----------+----------------------+--------------+----+-----------+---------------------------------------+---------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+------------------------------------------------------------+-----------+--------------+---------+-------------+------ ---------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+---------------------+---------------------+------+ | WatchID | JavaEnable | Title | GoodEvent | EventTime | EventDate | CounterID | ClientIP | RegionID | UserID | CounterClass | OS | UserAgent | URL | Referer | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersion4 | PageCharset | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID | OriginalURL | HID | IsOldCounter | IsEvent | IsParameter | DontC ountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash | URLHash | CLID | +---------------------+------------+---------------+-----------+------------+-----------+-----------+------------+----------+----------------------+--------------+----+-----------+---------------------------------------+---------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+------------------------------------------------------------+-----------+--------------+---------+-------------+------ ---------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+---------------------+---------------------+------+ | 9153127107923182022 | 1 | Участи NEWSru | 1 | 1373034098 | 15891 | 225510 | 1703485140 | 2 | -6224091410790412093 | 0 | 2 | 3 | http://liver.ru/belgorod/page=1024&wi | | 0 | 0 | 0 | 14328 | 22 | 2038 | 730 | 23 | 15 | 2 | 502 | 0 | 0 | 5 | D� | 1 | 1 | 0 | 0 | | | 4168741 | 0 | 0 | | 0 | 0 | 1058 | 549 | 135 | 2035708370 | 0 | 0 | 0 | 0 | windows | 1601 | 0 | 0 | 0 | 0 | http://video.yandex.ru/uglichnevyj-97442434830%20%D1%8C%20 | 298722980 | 0 | 0 | 0 | 0 | 0 | 5 | 1373021451 | 0 | 0 | 0 | 0 | 0 | 1961866254 | -1 | -1 | -1 | S0 | h1 | | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | | NH | 0 | | | | | | | | | | | 0 | -296158784638538920 | 7011450103338277684 | 0 | +---------------------+------------+---------------+-----------+------------+-----------+-----------+------------+----------+----------------------+--------------+----+-----------+---------------------------------------+---------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+------------------------------------------------------------+-----------+--------------+---------+-------------+------ ---------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+---------------------+---------------------+------+ 1 row in set. Query took 0.121 seconds. ``` ### Describe alternatives you've considered _No response_ ### Additional context I think the trick is intercepting requested URLs / references in [`DynamaicFileCatalog`](https://github.com/apache/arrow-datafusion/blob/dfb6435e16cf4cfd5245c84dd6e18fcf96ac72f2/datafusion-cli/src/catalog.rs#L33) and calling the appropriate object store registration function (e.g. what is in https://github.com/apache/arrow-datafusion/pull/9064 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
