bharath-techie commented on PR #18971:
URL: https://github.com/apache/datafusion/pull/18971#issuecomment-3592310619
Hi @martin-g @alamb ,
Can you help decide following to move the PR forward ?
- The limit is based on number of rows, so we either we keep it as `None` or
we make it configurable maybe ? If we are going to configuration route, I'm
leaning towards keeping `None` as default as number of rows will vary based on
data. Some users might have few columns, lots of rows and some vice versa.
- One use case is clickbench data, where we register partitions with ~99
million rows , and we want statistics of all files to be present for example.[
as given in parent issue https://github.com/apache/datafusion/issues/18952 ]
- Can you help decide whether to do this in background ? Doing it in sync
path will be deterministic behavior I feel. Otherwise we need to update
documentation to reflect the same.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]