Am 26.12.25 um 20:19 schrieb PDF Newbie via dev:
Hi. I’ve reviewed the documentation for the PDFMerger utility located here:
https://pdfbox.apache.org/3.0/commandline.html#pdfmerger
And it appears that it only accepts a list of input files on the command line
with the -i option. While this is good, I would also like to be able to pass a
list of file names (one per line), and have the list of files added to a single
output file specified by the already-existing -o option.
Why? The number of files I’ll be merging is larger than most UNIX/Linux shell
environments permit, and file names may be up to 68-characters long
(64-character string + .pdf), further reducing the number of files I can merge
within one command.
While I don’t particularly care how this is implemented, other open source
utilities follow the convention of prefixing an input file name with an ‘at’
symbol (‘@‘), or offer a separate option (“-l” / lowercase L) to specify the
file.
PDFBox offers the '@' option as well, there is no need to implement
anything else, if there aren't any other limitations.
Simply put any command line option to a text file and use the @-option
to append the options from the mentioned file to the cli:
java -jar pdfbox-app-3.y.z.jar merge @cli_options.txt
We should add that "hidden" feature to the documentation.
BUT there is a possible issue if one tries to merge to many/to big files
at once. As PDFBox merges one file after the other to the target without
saving the result, the merger utility will run sooner or later into an
OutOfMemory exception. To avoid that the implementation has to be
changed, e.g. saving the result after 10 or so input files ....
Testing only needs to be basic — before beginning, ensure all files described
in the list file exist, and optionally test the correctness/compliance of the
input PDF’s to ensure a defective merged PDF isn’t created, and ensure that the
input file doesn’t contain more pages than the PDF spec allows for, and
potentially adding a warning when files that contain incompatible options
(password protected features such as copying / printing, etc.) are added.
That feature sounds easy, but the details may become complicated.
- testing all files before merging will slow down the whole process,
especially if the number of files is huge
- it is tricky to define which files are correct/compliant as there are
many files in the wild which are corrupt or don't follow the spec but
can be read using Acrobat, which makes them "correct"/"complaint" and
PDFBox tries to read such files as well
- I never heard of an explicit maximum number of pages, but there is a
maximum size for an object number which in the end limits the maximum
number of pages. However, most likely such a pdf will be very huge and I
expect PDFBox will run into an OutOfMemory- or similar exception when
writing/reading such a pdf.
Given that the use case for this is commercial in nature, I’d like to offer a
bounty for this feature - I’m neutral as to which platform should be used for
posting / tracking / paying for this — just let me know which one. I’d prefer
to work with one of the core maintainers for the project, but other recognized
contributors are also eligible, my only requirement is that the change is added
to the main distribution. I’d also consider making a donation directly to the
project if that is preferred.
I'm hesitant to add your (enterprise-)workflow to an command line
utility which wasn't supposed to be used in an enterprise environment.
IMHO that check feature should be implemented in a separate tool and you
should do the checks yourself before feeding the merger utility with the
a list of files
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]