dan-s1 commented on PR #7194:
URL: https://github.com/apache/nifi/pull/7194#issuecomment-1524048362

   > @exceptionfactory @dan-s1 This is a really interesting PR. I've had a 
number of clients that would benefit from this sort of capability. My one big 
question/concern is whether or not this new reader will be tolerant of poorly 
formatted or inconsistent documents. As an example, one of the things I've seen 
over the years is people doing stuff like inserting arbitrary values at various 
points of the spreadsheet. Comments, stuff like that. Will this reader be 
flexible enough to ignore junk and badly formatted stuff or will it just fail 
on contact with such data? IMHO, to be genuinely useful it would need to have a 
flexible mode and a strict mode so that people could make at least a "best 
effort" at extracting data.
   > 
   > (I haven't actually dug into the PR, so just asking some high level 
questions. Going to try to start reviewing today or tomorrow)
   
   @MikeThomsen  I believe the Inference schema strategy will work for you 
customers as it is will infer all the different data types a column can have.  
Using that will allow for the tolerance while a more specific schema will 
enforce the strict mode.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to