[ 
https://issues.apache.org/jira/browse/NIFI-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard resolved NIFI-10654.
-----------------------------------
    Resolution: Feedback Received

Apache NiFi 1.x is no longer maintained and no new release is planned on the 
1.x release line. Marking as resolved as part of a cleanup operation. Please 
open a new one with an updated description if this is still relevant for NiFi 
2.x.

> UnpackContent Processor Doesn't Support Multi-part Files
> --------------------------------------------------------
>
>                 Key: NIFI-10654
>                 URL: https://issues.apache.org/jira/browse/NIFI-10654
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.18.0
>            Reporter: threeplanetssoftware
>            Priority: Minor
>         Attachments: encrypted.zip, unencrypted-1.zip
>
>
> I'm filing this as a bug due to this behavior generally working in zip 
> implementations and not in the UnpackContent processor. I can understand an 
> argument for making this an improvement ticket and will happily go that route 
> if the maintainers choose to change it.
> I am trying to deal with large (dozens of GB), split zip files that are 
> password protected. The zip file is split to allow for better downloading of 
> the parts, rather than having to wait for one 45GB file to download 
> successfully. A multipart zip file can't be handled piecemeal, so picking up 
> each part as a FlowFile and routing it into the UnpackContent processor won't 
> work.
> I tried putting them together manually first to at least make sure that would 
> work, but UnpackContent still refused with this error: 
> "UnpackContent[id=snipped] Unable to unpack FlowFile[filename=license-3.zip] 
> because it does not appear to have any entries; routing to failure." 
> Meanwhile, unzip opened the archive successfully, even if it gave an warning 
> about the multiple parts being put into the same file.
> I also tried this with an unencrypted split zip that I reconstructed and 
> UnpackContent failed with this error: 
> "UnpackContent[id=d834e7ae-0183-1000-cfe8-0806ea6d348d] Unable to unpack 
> FlowFile[filename=unencrypted_reconstructed-2.zip]; routing to failure: 
> org.apache.nifi.processor.exception.ProcessException: IOException thrown from 
> UnpackContent[id=d834e7ae-0183-1000-cfe8-0806ea6d348d]: 
> org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: 
> Unsupported feature splitting used in archive.
> - Caused by: 
> org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException: 
> Unsupported feature splitting used in archive."
> {*}Bug Request{*}: Can UnpackContent be fixed to support unpacking multipart 
> zip files, encrypted or not, that have been put back together in the proper 
> order prior to arrival at that processor?
> {*}Potential Feature Request{*}: Can UnpackContent (or some other processor) 
> be made to ingest multiple parts of a zip file and cat them together in the 
> right order in one FlowFile? The first file should be obvious from zip 
> headers and the last file should also be obvious from zip footers. Everything 
> in the middle should be sorted numerically by file extension. This would 
> allow me to use the InvokeHTTP processor to fetch these large files and have 
> them get joined together and forwarded on once the entire thing was built.
> {*}Reproduction{*}: I'm attaching two much smaller files to this ticket, both 
> were made concatting a few files in the latest NIFI release together to get 
> enough data to split then splitting on a Linux machine such as: `zip 
> --split-size 64k unencrypted.zip LICENSE`. `unencrypted.zip` is a multiple 
> part zip without a password that was concatenated back together in the proper 
> order. `encrypted.zip` is the same file and command, but with the addition of 
> the password "password" (no quotes). Running `unzip` on these produces the 
> correct file (LICENSE, md5sum: 108db6ee2249df0e1c7df85216e1b883).
> In Nifi, I have a GetFile processor to pick the file up explicitly by name, 
> send it directly to UnpackContent with mime type set to zip. For the 
> encrypted file, I added the password in as necessary.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to