Hi all,

In case of reading schema-inferable source like parquet, when no new data
is found, then, if i understand correctly, no schema can be inferred, and
need not to be.

Seeing this
method org.apache.hudi.utilities.sources.InputBatch#getSchemaProvider
requiring non-null schemaProvider, and
org.apache.hudi.utilities.deltastreamer.DeltaSync#readFromSource calling
getSchemaProvider() for all cases, including the no-new-data case,
exception will be thrown asking to set schema provider, for even reading
from schema-inferable parquet source. I think this is not an ideal case.

I had a short draft PR to accept null schema provider in case of no new data
https://github.com/apache/incubator-hudi/pull/1584/files
I actually prefer another approach of returning Option<SchemaProvider>
getSchemaProvider()

In case I have misunderstand the logic or use case, I'd like to ask for
some feedback on this change.

Thank you.

Regards,
Raymond

Reply via email to