[
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530448#comment-17530448
]
Dan Coldrick commented on TIKA-3742:
------------------------------------
Hi [~nick]
I'm struggling, I can see there are deletes which I want to exclude from the
parser but can't work out how to in Java.
I can see come out in DGN dump with a deleted attribute:
{code:java}
Element:Text Level:27 id:19707 (DELETED)
offset=1959730 size=74 bytes
graphic_group:0 color:0 weight:0 style:0
properties=1536,MODIFIED,NEW
origin=(963453.83000,96730.11000), rotation=272.763292
font=1, just=2, length_mult=119.99, height_mult=119.99
string = "HARVARD RD" {code}
I can see in the core element structure it should be there:
{code:java}
The first 18 words of an element in the design file are its fixed header --
containing the element type, level, words to follow, and range
information. The C declaration for this header is as follows
typedef struct
{
unsigned level:6 ; /* level element is
on */
unsigned :1 ; /* reserved */
unsigned complex:1 ; /* component of complex
elem.*/
unsigned type:7 ; /* type of element */
unsigned deleted:1 ; /* set if element is
deleted */
unsigned short words ; /* words to follow in
element */
unsigned long xlow ; /* element range -
low */
unsigned long ylow ;
unsigned long zlow ;
unsigned long xhigh ; /* element range -
high */
unsigned long yhigh ;
unsigned long zhigh ;
} Elm_hdr {code}
You get the type out (which I think is from the same header structure)
{code:java}
int h2 = tstream.read() ;
int type = h2 & 0x7f; {code}
How do I get the deleted attribute out so I can remove it from the parse
content? Also you said about type 37, I don't have any examples where we have
type 37 elements.
I've created a fork and created some dirty code to test in:
[https://github.com/monkmachine/tika/tree/TIKA-3742/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-cad-module/src/main/java/org/apache/tika/parser/dgn]
> Advice around DGN7 parser and whether to add to TIKA
> ----------------------------------------------------
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
> Issue Type: Task
> Components: parser
> Reporter: Dan Coldrick
> Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else.
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN.
> From my initial testing it looks pretty good.
> Would you guys think it was worth adding this or just keep it as a custom
> parser rather than in the main source code? It's under MIT license. I've
> attached the exe (zipped), a copy of the output from the dump and my very
> dirty testing calling the exe (my code I was only interested in the Strings
> so am only pulling those into a string array at the moment to check it's
> pulling out the correct data).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)