alamb commented on issue #19654: URL: https://github.com/apache/datafusion/issues/19654#issuecomment-3711997077
Hi @AntoinePrv -- I am definitely surprised at this finding (I would expect
DataFusion to do this pretty fast)
I think what is happening is that datafusion is not pushing down the offset
into the parquet scan
To reproduce, using the `hits` dataset from parquet:
```shell
./benchmarks/bench.sh data clickbench_partitioned
```
Then run
```sql
> select * from 'benchmarks/data/hits_partitioned' LIMIT 1;
Elapsed 0.086 seconds.
```
Then run with a large offset
```sql
> select * from 'benchmarks/data/hits_partitioned' LIMIT 1 OFFSET 50000000;
Elapsed 14.284 seconds.
```
You can basically see the problem in the plan (there is no limit / offset in
the datasource exec)
```sql
> explain format indent select * from 'benchmarks/data/hits_partitioned'
LIMIT 1 OFFSET 50000000;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Limit: skip=50000000, fetch=1
|
| | TableScan: benchmarks/data/hits_partitioned
projection=[WatchID, JavaEnable, Title, GoodEvent, EventTime, EventDate,
CounterID, ClientIP, RegionID, UserID, CounterClass, OS, UserAgent, URL,
Referer, IsRefresh, RefererCategoryID, RefererRegionID, URLCategoryID,
URLRegionID, ResolutionWidth, ResolutionHeight, ResolutionDepth, FlashMajor,
FlashMinor, FlashMinor2, NetMajor, NetMinor, UserAgentMajor, UserAgentMinor,
CookieEnable, JavascriptEnable, IsMobile, MobilePhone, MobilePhoneModel,
Params, IPNetworkID, TraficSourceID, SearchEngineID, SearchPhrase, AdvEngineID,
IsArtifical, WindowClientWidth, WindowClientHeight, ClientTimeZone,
ClientEventTime, SilverlightVersion1, SilverlightVersion2, SilverlightVersion3,
SilverlightVersion4, PageCharset, CodeVersion, IsLink, IsDownload, IsNotBounce,
FUniqID, OriginalURL, HID, IsOldCounter, IsEvent, IsParameter, DontCountHits,
WithHash, HitColor, LocalEventTime, Age, Sex, Income, Interests, Robotness,
RemoteIP, WindowName, Ope
nerName, HistoryLength, BrowserLanguage, BrowserCountry, SocialNetwork,
SocialAction, HTTPError, SendTiming, DNSTiming, ConnectTiming,
ResponseStartTiming, ResponseEndTiming, FetchTiming, SocialSourceNetworkID,
SocialSourcePage, ParamPrice, ParamOrderID, ParamCurrency, ParamCurrencyID,
OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID,
RefererHash, URLHash, CLID], fetch=50000001
|
| physical_plan | GlobalLimitExec: skip=50000000, fetch=1
|
| | CoalescePartitionsExec: fetch=50000001
|
| | DataSourceExec: file_groups={16 groups:
[[Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_0.parquet:0..122446530,
Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_1.parquet:0..174965044,
Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_14.parquet:0..151121699,
Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_15.parquet:0..36690483],
[Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_15.parquet:36690483..103098894,
Users/andrewlamb/Software/
> explain format indent select * from 'benchmarks/data/hits_partitioned'
LIMIT 1 OFFSET 50000000;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Limit: skip=50000000, fetch=1
|
| | TableScan: benchmarks/data/hits_partitioned
projection=[WatchID, ...], fetch=50000001
|
| physical_plan | GlobalLimitExec: skip=50000000, fetch=1
|
| | CoalescePartitionsExec: fetch=50000001
|
| | DataSourceExec: file_groups={16 groups: ..., ...],
...]}, projection=[WatchID, ...], limit=50000001, file_type=parquet |
| |
|
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.127 seconds.
>
```
(note only the limit, not the offset, is processed)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
