dan-s1 commented on PR #7194: URL: https://github.com/apache/nifi/pull/7194#issuecomment-1524048362
> @exceptionfactory @dan-s1 This is a really interesting PR. I've had a number of clients that would benefit from this sort of capability. My one big question/concern is whether or not this new reader will be tolerant of poorly formatted or inconsistent documents. As an example, one of the things I've seen over the years is people doing stuff like inserting arbitrary values at various points of the spreadsheet. Comments, stuff like that. Will this reader be flexible enough to ignore junk and badly formatted stuff or will it just fail on contact with such data? IMHO, to be genuinely useful it would need to have a flexible mode and a strict mode so that people could make at least a "best effort" at extracting data. > > (I haven't actually dug into the PR, so just asking some high level questions. Going to try to start reviewing today or tomorrow) @MikeThomsen I believe the Inference schema strategy will work for you customers as it is will infer all the different data types a column can have. Using that will allow for the tolerance while a more specific schema will enforce the strict mode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
