Re: [PR] mmap files when possible to improve CLI parse performance [daffodil]

via GitHub Wed, 21 Aug 2024 07:44:05 -0700


stevedlawrence commented on code in PR #1274:
URL: https://github.com/apache/daffodil/pull/1274#discussion_r1725182356



##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -1165,13 +1168,24 @@ class Main(
           case Some(processor) => {
             Assert.invariant(!processor.isError)
             val input = parseOpts.infile.toOption match {
-              case Some("-") | None => STDIN
+              case Some("-") | None => InputSourceDataInputStream(STDIN)
               case Some(file) => {
-                val f = new File(file)
-                new FileInputStream(f)
+                // for files <= 2GB, use a mapped byte buffer to avoid the 
overhead related to
+                // the BucketingInputSource. Larger files cannot be mapped so 
we cannot avoid it
+                val path = Paths.get(file)
+                val size = Files.size(path)
+                if (size <= Int.MaxValue) {

Review Comment:
   Possibly. Though, I imagine if you're parsing a small file with the CLI then 
the overhead of mmap is going to be relatively small compared to the overhead 
of starting up a JVM and maybe the it won't make a difference? I'm not sure. We 
can do some experiments to see if there's a benefit for smaller files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] mmap files when possible to improve CLI parse performance [daffodil]

Reply via email to