[
https://issues.apache.org/jira/browse/CAMEL-12698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609313#comment-16609313
]
ASF GitHub Bot commented on CAMEL-12698:
----------------------------------------
MakotoTheKnight commented on a change in pull request #2454: CAMEL-12698: Use
the Stream API to read files instead of Scanner
URL: https://github.com/apache/camel/pull/2454#discussion_r216350632
##########
File path:
components/camel-bindy/src/main/java/org/apache/camel/dataformat/bindy/csv/BindyCsvDataFormat.java
##########
@@ -138,58 +142,74 @@ public Object unmarshal(Exchange exchange, InputStream
inputStream) throws Excep
// List of Pojos
List<Map<String, Object>> models = new ArrayList<>();
- // Pojos of the model
- Map<String, Object> model;
InputStreamReader in = null;
- Scanner scanner = null;
try {
if (checkEmptyStream(factory, inputStream)) {
return models;
}
in = new InputStreamReader(inputStream,
IOHelper.getCharsetName(exchange));
-
- // Scanner is used to read big file
- scanner = new Scanner(in);
-
+
// Retrieve the separator defined to split the record
String separator = factory.getSeparator();
String quote = factory.getQuote();
ObjectHelper.notNull(separator, "The separator has not been
defined in the annotation @CsvRecord or not instantiated during initModel.");
-
- int count = 0;
-
- // If the first line of the CSV file contains columns name, then we
- // skip this line
- if (factory.getSkipFirstLine()) {
- // Check if scanner is empty
- if (scanner.hasNextLine()) {
- scanner.nextLine();
+ AtomicInteger count = new AtomicInteger(0);
+
+ // Use a Stream to stream a file across.
+ try (Stream<String> lines = new BufferedReader(in).lines()) {
+ int linesToSkip = 0;
+
+ // If the first line of the CSV file contains columns name,
then we
+ // skip this line
+ if (factory.getSkipFirstLine()) {
+ linesToSkip = 1;
}
- }
-
- while (scanner.hasNextLine()) {
-
- // Read the line
- String line = scanner.nextLine().trim();
-
- if (ObjectHelper.isEmpty(line)) {
- // skip if line is empty
- continue;
+
+ // Consume the lines in the file via a consumer method,
passing in state as necessary.
+ // If the internals of the consumer fail, we unrap the checked
exception upstream.
+ try {
+ lines.skip(linesToSkip)
+ .forEachOrdered(consumeFile(factory, models,
separator, quote, count));
+ } catch (WrappedException e) {
+ throw e.getWrappedException();
Review comment:
Because we're performing this operation inside of a `Stream`, we have to
throw unchecked exceptions. Streams do not allow us to propagate a checked
exception, and the operations it's doing have the potential to throw checked
exceptions, so wrapping it in an unchecked exception (and later throwing it if
necessary) is the way to work around that limitation in `Stream`s.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Unmarshaling a CSV file with the NEL (next line) character will cause Bindy
> to misread the entire file
> ------------------------------------------------------------------------------------------------------
>
> Key: CAMEL-12698
> URL: https://issues.apache.org/jira/browse/CAMEL-12698
> Project: Camel
> Issue Type: Bug
> Components: camel-bindy
> Affects Versions: 2.22.0
> Reporter: Jason Black
> Priority: Major
>
> I am using Apache Camel to process a lot of large CSV files, and relying on
> Bindy to assist with unmarshalling them into POJOs.
> We have an upstream data bug which causes a record of ours to contain the
> Unicode character
> [NEL|http://www.fileformat.info/info/unicode/char/85/index.htm], but while
> we're working through the cause of that, I found it curious as to what Bindy
> is actually doing with it. We rely on the unmarshal process to perform a
> batch insert, and because our POJO is missing certain fields, we started
> observing that the
> Bindy is relying on Scanner to read lines in a large file; however, Scanner
> itself also does some parsing of the line with the assumption that, if it
> sees the NEL character, it will regard it as a newline character. The modern
> Files API does not make this distinction and reads to a newline designation
> only (e.g \n, \r, or \r\n).
> There are two ways to fix this from what I've been able to smoke test:
> * Change the Scanner implementation to use a delimeter of the more
> traditional newline characters
> * Use Java 8's Files API and stream the file in
> I would personally want to use the Files API to handle this since it's more
> robust and capable of higher performance, but I'll explore both approaches
> and see where I end up.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)