friendlymatthew commented on PR #17076: URL: https://github.com/apache/datafusion/pull/17076#issuecomment-3168929125
> Thanks @friendlymatthew > > While I can see the rationale behind this change, I'd like to understand more about why this approach is definitively better than the existing implementation. Given that this modifies the `FileSource` it would be a breaking change for downstream users? > > Please let me know the specific example where builder wouldn't work for you? Hi, the `FileConfigBuilder` initializing batch size as an `Option` just for every subsequent call site to require it as `Some` felt awkward. The `with_batch_size` signature reinforces this, since it only accepts a `usize` and immediately wraps it in `Some`. From what I understand, batch size is only used when opening/reading files. Instead of requiring the user to set it in advance (and risking a panic if omitted), I think it's cleaner to pass it as a required parameter at the point where it's actually needed: ```rs let config = CsvSource::new(true, b',', b'"') .with_comment(Some(b'#')) .with_schema(schema) // .with_batch_size(8192) --> If I omit this, we'll panic when we create a file opener .with_projection(&scan_config); let opener = config.create_file_opener(object_store, &scan_config, 0); // let's just pass it in here? ``` > The current approach using with_batch_size() follows a builder pattern which is flexible and composable. Given that this modifies the FileSource trait and affects multiple DataSource structs, what's the migration path for downstream users? One option would be to deprecate `with_batch_size` and point users toward the new parameter-passing approach. (I'm considering doing something similar for `with_projection`: https://github.com/apache/datafusion/issues/17095). Stepping back, my main goal is to break the tight coupling between `FileSourceConfig` and `FileSource`. Right now, the config struct both stores configuration values _and_ the underlying file source, which in turn needs to call back into the config-- that feels like a design smell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org