stevedlawrence commented on code in PR #1274:
URL: https://github.com/apache/daffodil/pull/1274#discussion_r1725232905
##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -1165,13 +1168,24 @@ class Main(
case Some(processor) => {
Assert.invariant(!processor.isError)
val input = parseOpts.infile.toOption match {
- case Some("-") | None => STDIN
+ case Some("-") | None => InputSourceDataInputStream(STDIN)
case Some(file) => {
- val f = new File(file)
- new FileInputStream(f)
+ // for files <= 2GB, use a mapped byte buffer to avoid the
overhead related to
+ // the BucketingInputSource. Larger files cannot be mapped so
we cannot avoid it
+ val path = Paths.get(file)
+ val size = Files.size(path)
+ if (size <= Int.MaxValue) {
+ val fc = FileChannel.open(path, StandardOpenOption.READ)
+ val bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, size)
Review Comment:
We could, but I'm a little hesitant to force something on a API user if we
can't say for sure it will be faster in 100% of cases, especially if there are
cases where it could be slower (e.g. like with small files you mentioned).
Maybe an alternative might be to instead just provide better API
documentation, maybe something like:
> The InputStream variant has potential overhead due to streaming
capabilities and support for unlimited data sizes. In some cases, better
performance might come from using the ByteBuffer variant instead. For example,
if your data is already in a byte array, one should use the Array[Byte] or
ByteBuffer variants instead of wrapping it in a ByteArrayInputStream. As
another example, instead of using a FileInputStream one could consider mapping
the File to a MappedByteBuffer, keeping in mind that MappedByteBuffers might
have different performance characteristics depending on the file size and
system.
And then we leave it up to the API users to figure out what works best for
their system/environment?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]