[ 
https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979614#comment-13979614
 ] 

Hong-Thai Nguyen commented on TIKA-1224:
----------------------------------------

Thank [~ben.12] for feedback.
For line return problem at output, I created a new issue: TIKA-1279
For -t option in TikaCLI, It's ambiguous on mimetype of java file. It's could 
be text/plain (in this case, TxtParser will be used to return original text as 
is), x-java-source (SourceCodeParser will be used).

For -h option, output is normally something:
{code}
Author: Hong-Thai.Nguyen
Content-Encoding: windows-1252
Content-Length: 4899
Content-Type: text/x-java-source
LoC: 133
creator: Hong-Thai.Nguyen
dc:creator: Hong-Thai.Nguyen
meta:author: Hong-Thai.Nguyen
resourceName: SourceCodeParser.java
{code}
the creator is from 'author' annotation in javadoc.

This parser is quite generic (quick and dirty as mentioned by [~kkrugler]) and 
simplistic. We can make a more dedicate Java source parser and extract more 
metadata (member, attributes...). If you interest this kind of parser, please 
create new issue and eventually an investigation on this work is warmly welcome.

Regards,

> Adding Source code (Java, Groovy, C) parser
> -------------------------------------------
>
>                 Key: TIKA-1224
>                 URL: https://issues.apache.org/jira/browse/TIKA-1224
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Hong-Thai Nguyen
>            Priority: Minor
>
> We can parser some source code file formats:
> text/x-java-source
> text/x-groovy
> text/x-c
> for HTML rendering from code, we can use jhightlight: 
> http://www.ohloh.net/p/jhighlight



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to