[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979614#comment-13979614 ]
Hong-Thai Nguyen commented on TIKA-1224: ---------------------------------------- Thank [~ben.12] for feedback. For line return problem at output, I created a new issue: TIKA-1279 For -t option in TikaCLI, It's ambiguous on mimetype of java file. It's could be text/plain (in this case, TxtParser will be used to return original text as is), x-java-source (SourceCodeParser will be used). For -h option, output is normally something: {code} Author: Hong-Thai.Nguyen Content-Encoding: windows-1252 Content-Length: 4899 Content-Type: text/x-java-source LoC: 133 creator: Hong-Thai.Nguyen dc:creator: Hong-Thai.Nguyen meta:author: Hong-Thai.Nguyen resourceName: SourceCodeParser.java {code} the creator is from 'author' annotation in javadoc. This parser is quite generic (quick and dirty as mentioned by [~kkrugler]) and simplistic. We can make a more dedicate Java source parser and extract more metadata (member, attributes...). If you interest this kind of parser, please create new issue and eventually an investigation on this work is warmly welcome. Regards, > Adding Source code (Java, Groovy, C) parser > ------------------------------------------- > > Key: TIKA-1224 > URL: https://issues.apache.org/jira/browse/TIKA-1224 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.5 > Reporter: Hong-Thai Nguyen > Priority: Minor > > We can parser some source code file formats: > text/x-java-source > text/x-groovy > text/x-c > for HTML rendering from code, we can use jhightlight: > http://www.ohloh.net/p/jhighlight -- This message was sent by Atlassian JIRA (v6.2#6252)