[
https://issues.apache.org/jira/browse/NIFI-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pierre Villard resolved NIFI-10654.
-----------------------------------
Resolution: Feedback Received
Apache NiFi 1.x is no longer maintained and no new release is planned on the
1.x release line. Marking as resolved as part of a cleanup operation. Please
open a new one with an updated description if this is still relevant for NiFi
2.x.
> UnpackContent Processor Doesn't Support Multi-part Files
> --------------------------------------------------------
>
> Key: NIFI-10654
> URL: https://issues.apache.org/jira/browse/NIFI-10654
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.18.0
> Reporter: threeplanetssoftware
> Priority: Minor
> Attachments: encrypted.zip, unencrypted-1.zip
>
>
> I'm filing this as a bug due to this behavior generally working in zip
> implementations and not in the UnpackContent processor. I can understand an
> argument for making this an improvement ticket and will happily go that route
> if the maintainers choose to change it.
> I am trying to deal with large (dozens of GB), split zip files that are
> password protected. The zip file is split to allow for better downloading of
> the parts, rather than having to wait for one 45GB file to download
> successfully. A multipart zip file can't be handled piecemeal, so picking up
> each part as a FlowFile and routing it into the UnpackContent processor won't
> work.
> I tried putting them together manually first to at least make sure that would
> work, but UnpackContent still refused with this error:
> "UnpackContent[id=snipped] Unable to unpack FlowFile[filename=license-3.zip]
> because it does not appear to have any entries; routing to failure."
> Meanwhile, unzip opened the archive successfully, even if it gave an warning
> about the multiple parts being put into the same file.
> I also tried this with an unencrypted split zip that I reconstructed and
> UnpackContent failed with this error:
> "UnpackContent[id=d834e7ae-0183-1000-cfe8-0806ea6d348d] Unable to unpack
> FlowFile[filename=unencrypted_reconstructed-2.zip]; routing to failure:
> org.apache.nifi.processor.exception.ProcessException: IOException thrown from
> UnpackContent[id=d834e7ae-0183-1000-cfe8-0806ea6d348d]:
> org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException:
> Unsupported feature splitting used in archive.
> - Caused by:
> org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException:
> Unsupported feature splitting used in archive."
> {*}Bug Request{*}: Can UnpackContent be fixed to support unpacking multipart
> zip files, encrypted or not, that have been put back together in the proper
> order prior to arrival at that processor?
> {*}Potential Feature Request{*}: Can UnpackContent (or some other processor)
> be made to ingest multiple parts of a zip file and cat them together in the
> right order in one FlowFile? The first file should be obvious from zip
> headers and the last file should also be obvious from zip footers. Everything
> in the middle should be sorted numerically by file extension. This would
> allow me to use the InvokeHTTP processor to fetch these large files and have
> them get joined together and forwarded on once the entire thing was built.
> {*}Reproduction{*}: I'm attaching two much smaller files to this ticket, both
> were made concatting a few files in the latest NIFI release together to get
> enough data to split then splitting on a Linux machine such as: `zip
> --split-size 64k unencrypted.zip LICENSE`. `unencrypted.zip` is a multiple
> part zip without a password that was concatenated back together in the proper
> order. `encrypted.zip` is the same file and command, but with the addition of
> the password "password" (no quotes). Running `unzip` on these produces the
> correct file (LICENSE, md5sum: 108db6ee2249df0e1c7df85216e1b883).
> In Nifi, I have a GetFile processor to pick the file up explicitly by name,
> send it directly to UnpackContent with mime type set to zip. For the
> encrypted file, I added the password in as necessary.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)