[ 
https://issues.apache.org/jira/browse/TIKA-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486030#comment-14486030
 ] 

Paolo Nacci commented on TIKA-1022:
-----------------------------------

DWG files saved from autocad 2015 have CUSTOM_PROPERTIES_ALT_PADDING_VALUES[0] 
= 0x5 (even if you save it as 2010 or 2013 version).
I think that it's safe to change DWGParser.java to check first "padding" to be 
<=5 and other padding bytes = 0 and skip CUSTOM_PROPERTIES_ALT_PADDING_VALUES 
check.
I added AC1027 (autocad 2013, 2014, 2015) too.

Proposed patch for DWGParser.java:
96,103c96,98
<     private static final int CUSTOM_PROPERTIES_SKIP = 20;
<     
<     /** 
<      * The value of padding bytes other than 0 in some DWG files.
<      */
<     private static final int[] CUSTOM_PROPERTIES_ALT_PADDING_VALUES = new 
int[] {0x2, 0, 0, 0};
< 
<     public void parse(
---
>     private static final int CUSTOM_PROPERTIES_SKIP = 20; 
>       
>       public void parse(
125c120
<         } else if (version.equals("AC1021") || version.equals("AC1024")) {
---
>         } else if (version.equals("AC1021") || version.equals("AC1024")|| 
> version.equals("AC1027")) {
325c320
<        // There should be 4 zero bytes or 
CUSTOM_PROPERTIES_ALT_PADDING_VALUES next
---
>        // There should be 4 zero bytes next
328,335c323,326
<        if((padding[0] == 0 && padding[1] == 0 &&
<              padding[2] == 0 && padding[3] == 0) ||
<              (padding[0] == CUSTOM_PROPERTIES_ALT_PADDING_VALUES[0] && 
<                padding[1] == CUSTOM_PROPERTIES_ALT_PADDING_VALUES[1] &&
<                padding[2] == CUSTOM_PROPERTIES_ALT_PADDING_VALUES[2] &&
<                padding[3] == CUSTOM_PROPERTIES_ALT_PADDING_VALUES[3])) {
<            
<           // Looks hopeful, skip on
---
>        if(padding[0] <= 5 && padding[1] == 0 &&
>              padding[2] == 0 && padding[3] == 0) {
> 
>                          // Looks hopeful, skip on

I attached testDWG2015_custom_props.dwg for test unit.


> DWG Custom properties not extracted
> -----------------------------------
>
>                 Key: TIKA-1022
>                 URL: https://issues.apache.org/jira/browse/TIKA-1022
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 1.0, 1.1, 1.2, 1.3
>            Reporter: Paolo Nacci
>            Assignee: Ray Gauss II
>              Labels: patch
>             Fix For: 1.3
>
>         Attachments: quick2010-tika-no-custom.dwg
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Based on some code I provided some time ago (Alfresco forum), Derek Hulley 
> opened ALF-2262, Nick Burch opened TIKA-413 issue and code has been committed 
> to TIKA (0.8).
> With sample dwg provided TIKA (0.8 to 1.2) is correctly working but with 
> attached file returns no custom metadata (my original "C" returns correct 
> custom metadata, dwg is "2010" format).
> Tested tika-app.1.0.jar and tika-app.1.2.jar and tika 1.3 snapshot.
> All versions could be impacted by this bug. 
> I found failing code in skipToCustomProperties() of DWGParser.java, lines 
> 320-321: 
> if(padding[0] == 0 && padding[1] == 0 &&
>   padding[2] == 0 && padding[3] == 0) {
> padding[0] byte is not always 0 (attached file has 0x2) and probably there is 
> no need to check those bytes.
> Index: DWGParser.java
> ===================================================================
> --- DWGParser.java    (revisione 1407024)
> +++ DWGParser.java    (copia locale)
> @@ -93,7 +93,7 @@
>       * How far to skip after the last standard property, before
>       *  we find any custom properties that might be there.
>       */
> -    private static final int CUSTOM_PROPERTIES_SKIP = 20; 
> +    private static final int CUSTOM_PROPERTIES_SKIP = 24; 
>  
>      public void parse(
>              InputStream stream, ContentHandler handler,
> @@ -317,13 +317,7 @@
>  
>      private int skipToCustomProperties(InputStream stream) 
>              throws IOException, TikaException {
> -       // There should be 4 zero bytes next
> -       byte[] padding = new byte[4];
> -       IOUtils.readFully(stream, padding);
> -       if(padding[0] == 0 && padding[1] == 0 &&
> -             padding[2] == 0 && padding[3] == 0) {
> -          // Looks hopeful, skip on
> -          padding = new byte[CUSTOM_PROPERTIES_SKIP];
> +          byte[] padding = new byte[CUSTOM_PROPERTIES_SKIP];
>            IOUtils.readFully(stream, padding);
>            
>            // We should now have the count
> @@ -337,10 +331,6 @@
>               // No properties / count is too high to trust
>               return 0;
>            }
> -       } else {
> -          // No padding. That probably means no custom props
> -          return 0;
> -       }
>      }
>  
>  }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to