[ 
https://issues.apache.org/jira/browse/COMPRESS-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099530#comment-17099530
 ] 

Bear R Giles edited comment on COMPRESS-513 at 5/5/20, 5:05 AM:
----------------------------------------------------------------

Here's some more details.

Header:
{noformat}
 0000: 2e 2f 2e 2f 40 4c 6f 6e - 67 4c 69 6e 6b 00 00 00 ././@LongLink... 
 0010: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0020: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0030: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0040: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0050: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0060: 00 00 00 00 30 30 30 30 - 36 34 34 00 30 30 30 30 ....0000644.0000 
 0070: 30 30 30 00 30 30 30 30 - 30 30 30 00 30 30 30 30 000.0000000.0000 
 0080: 30 30 30 30 31 35 37 00 - 30 30 30 30 30 30 30 30 0000157.00000000 
 0090: 30 30 30 00 30 31 31 36 - 30 36 00 20 4c 00 00 00 000.011606. L... 
 00a0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00b0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00c0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00d0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00e0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00f0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0100: 00 75 73 74 61 72 20 20 - 00 72 6f 6f 74 00 00 00 .ustar .root... 
 0110: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0120: 00 00 00 00 00 00 00 00 - 00 72 6f 6f 74 00 00 00 .........root... 
 0130: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0140: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0150: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0160: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0170: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0180: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0190: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01a0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01b0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01c0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01d0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01e0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01f0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
{noformat}

The decoded PAX headers

{noformat}
 – Global Pax headers
 – Extra PAX headers
 – TarArchiveStructSparse
 offset: 0, numbytes: 16384
 offset: 24576, numbytes: 12288
 offset: 40960, numbytes: 12288
 offset: 57344, numbytes: 12288
 offset: 4202496, numbytes: 0
{noformat}

That offset syncs up with the header's 'realsize' mentioned earlier.

The 'old gnu header` block contains
{noformat}
 0000: c3 ca 04 c1 00 00 02 00 - 03 00 00 00 00 10 00 00 ................ 
 0010: 04 00 00 00 00 04 00 00 - 03 00 00 00 01 00 00 00 ................ 
 0020: 00 00 00 00 fc 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0030: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0040: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0050: 73 77 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 sw.............. 
 0060: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0070: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0080: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0090: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00a0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00b0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00c0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00d0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00e0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 00f0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0100: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0110: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0120: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0130: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0140: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0150: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0160: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0170: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0180: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 0190: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01a0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01b0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01c0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01d0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01e0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
 01f0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 
{noformat}

I thought this was a sqlite database but it's not. I don't have that specific 
file on my system at the moment but a similar one (with different number in the 
path) looks very familiar:

{noformat}
 00000000 c3 ca 04 c1 00 00 02 00 03 00 00 00 00 10 00 00 |................|
 00000010 04 00 00 00 00 04 00 00 03 00 00 00 01 00 00 00 |................|
 00000020 00 00 00 00 fc 00 00 00 00 00 00 00 00 00 00 00 |................|
 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 *
 00000050 73 77 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |sw..............|
 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
 *
{noformat}

(different format since different tool)

Put it together and it looks like `readOldGNUSparse()` is getting called with 
either `currEntry.isExtended()` incorrectly set or it's missing an additional 
test that tells it that there's no additional headers.


was (Author: bgiles):
Here's some more details.

Header:
```
0000:  2e 2f 2e 2f 40 4c 6f 6e - 67 4c 69 6e 6b 00 00 00    ././@LongLink... 
0010:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0020:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0030:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0040:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0050:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0060:  00 00 00 00 30 30 30 30 - 36 34 34 00 30 30 30 30    ....0000644.0000 
0070:  30 30 30 00 30 30 30 30 - 30 30 30 00 30 30 30 30    000.0000000.0000 
0080:  30 30 30 30 31 35 37 00 - 30 30 30 30 30 30 30 30    0000157.00000000 
0090:  30 30 30 00 30 31 31 36 - 30 36 00 20 4c 00 00 00    000.011606. L... 
00a0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00b0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00c0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00d0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00e0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00f0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0100:  00 75 73 74 61 72 20 20 - 00 72 6f 6f 74 00 00 00    .ustar  .root... 
0110:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0120:  00 00 00 00 00 00 00 00 - 00 72 6f 6f 74 00 00 00    .........root... 
0130:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0140:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0150:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0160:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0170:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0180:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0190:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01a0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01b0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01c0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01d0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01e0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01f0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
```

The decoded PAX headers

```
-- Global Pax headers
-- Extra PAX headers
-- TarArchiveStructSparse
offset: 0, numbytes: 16384
offset: 24576, numbytes: 12288
offset: 40960, numbytes: 12288
offset: 57344, numbytes: 12288
offset: 4202496, numbytes: 0
```

That offset syncs up with the header's 'realsize' mentioned earlier. 

The 'old gnu header` block contains
```
0000:  c3 ca 04 c1 00 00 02 00 - 03 00 00 00 00 10 00 00    ................ 
0010:  04 00 00 00 00 04 00 00 - 03 00 00 00 01 00 00 00    ................ 
0020:  00 00 00 00 fc 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0030:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0040:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0050:  73 77 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    sw.............. 
0060:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0070:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0080:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0090:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00a0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00b0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00c0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00d0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00e0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
00f0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0100:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0110:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0120:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0130:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0140:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0150:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0160:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0170:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0180:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
0190:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01a0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01b0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01c0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01d0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01e0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
01f0:  00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00    ................ 
```

I thought this was a sqlite database but it's not. I don't have that specific 
file on my system at the moment but a similar one (with different number in the 
path) looks very familiar:

```
00000000  c3 ca 04 c1 00 00 02 00  03 00 00 00 00 10 00 00  |................|
00000010  04 00 00 00 00 04 00 00  03 00 00 00 01 00 00 00  |................|
00000020  00 00 00 00 fc 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000050  73 77 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |sw..............|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
```

(different format since different tool)

Put it together and it looks like `readOldGNUSparse()` is getting called with 
either `currEntry.isExtended()` incorrectly set or it's missing an additional 
test that tells it that there's no additional headers.













{code}

> [Tar] error decoding sparse file header
> ---------------------------------------
>
>                 Key: COMPRESS-513
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-513
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.20
>         Environment: Ubuntu 19.10.
> Gnu tar 1.30
> File generated with
>  
> {code:java}
> PREFIX=/media/backups/monthly/users
> SYSTEM=`uname -n`
> TS=`/bin/date +%Y-%m-%d`
> for USER in bgiles
> do
>     DIR=${PREFIX}/${USER}/home/${TS}
>     /usr/bin/install -d ${DIR}
>     LABEL=home-${USER}-${SYSTEM}-${TS}
>     BASENAME=${DIR}/${LABEL}
>     /bin/tar czvf ${BASENAME}.tar.gz \
>         --index-file ${BASENAME}.idx \
>         --exclude-tag=NOARCHIVE.TAG \
>         --exclude-caches-all \
>         --preserve-permissions \
>         --sparse \
>         --label=${LABEL} \
>         --one-file-system \
>         --directory /home/${USER} . 
>     /bin/gzip ${BASENAME}.idx
>     # change ownership
>     /bin/chown -R backup:backup ${DIR}
>     /bin/chmod -R o-rwx ${DIR}
>     /bin/chmod -R o-rwx,a-w ${DIR}/*
> done
> {code}
>  
>  
>            Reporter: Bear R Giles
>            Priority: Major
>
> I am seeing an IllegalArgumentException when attempting to scan a (gnu) tar 
> file containing a backup of my home directory. The entry is a sqlite database 
> table used by chromium.
> The archive file is 62 GB and over 1 million files. ( ! ) (Can you tell I'm a 
> developer?)
> The error is:
> {code:java}
> java.lang.IllegalArgumentException: At offset 0, 12 byte binary number 
> exceeds maximum signed long value
> java.lang.IllegalArgumentException: At offset 0, 12 byte binary number 
> exceeds maximum signed long value
>    at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseBinaryBigInteger(TarUtils.java:231){code}
> From instrumenting the code I can determine that the TarArchiveEntry reports:
>  * name: 
> ./snap/chromium/1005/.config/chromium/Default/Storage/ext/jajcoljhdglkjpfefjkgiohbhnkkmipm/def/GPUCache/data_3
>  * mode: 0600
>  * size: 53248
>  * real: 4202496
> The (presumed) sparse headers are:
> {code:java}
> c3 ca 04 c1 00 00 02 00 03 00 00 00  |
> 00 10 00 00 04 00 00 00 00 04 00 00  |
> 03 00 00 00 01 00 00 00 00 00 00 00  |
> fc 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 73 77 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00  |
> 00 00 00 00 00 00 00 00 00 00 00 00 
> {code}
> And for this specific entry 
>  * buffer: c3 ca 04 c1 00 00 02 00 03 00 00 00
>  * remainder: ca 04 c1 00 00 02 00 03 00 00 00
>  * neg: false
>  * value: -65259544571650071836229632
> I'll add the full header in a comment later today. It looks likely that the 
> header format isn't properly recognized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to