stevedlawrence commented on code in PR #1274:
URL: https://github.com/apache/daffodil/pull/1274#discussion_r1725314517
##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -1165,13 +1168,24 @@ class Main(
case Some(processor) => {
Assert.invariant(!processor.isError)
val input = parseOpts.infile.toOption match {
- case Some("-") | None => STDIN
+ case Some("-") | None => InputSourceDataInputStream(STDIN)
case Some(file) => {
- val f = new File(file)
- new FileInputStream(f)
+ // for files <= 2GB, use a mapped byte buffer to avoid the
overhead related to
+ // the BucketingInputSource. Larger files cannot be mapped so
we cannot avoid it
+ val path = Paths.get(file)
+ val size = Files.size(path)
+ if (size <= Int.MaxValue) {
Review Comment:
The nightlies don't use the `parse` command so won't see any change. They
use the `performance` command which reads test files into a byte array before
testing to avoid overhead related to disk I/O.
We could create some patches that run on the nightlies, one patch change the
performance command to use FileInputStream and one to use a MappedByteBuffer,
which would give us an idea of mmap vs file input stream. But that's feels like
a decent amount of work just to figure out an optimal size where mmap overhead
> bucketing overhead. Also, based on my bucketing vs non-bucketing tests, I
feel like bucketing overhead is probably more than mmap-overhead, even with
small files and so we should always avoid bucketing when possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]