[
https://issues.apache.org/jira/browse/TIKA-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated TIKA-2602:
--------------------------------
Attachment: VERSION_Test
> iCalendar not properly recognized as text/calendar
> --------------------------------------------------
>
> Key: TIKA-2602
> URL: https://issues.apache.org/jira/browse/TIKA-2602
> Project: Tika
> Issue Type: Improvement
> Reporter: Andreas Meier
> Priority: Major
> Attachments: VERSION_Test
>
>
> At the moment the detection of text/calender is covered by the following
> mime-type-element:
> {code:xml}
> <mime-type type="text/calendar">
> <magic priority="50">
> <match value="BEGIN:VCALENDAR" type="string" offset="0">
> <match value="VERSION:2.0" type="string" offset="15:30"/>
> </match>
> </magic>
> <glob pattern="*.ics"/>
> <glob pattern="*.ifb"/>
> <sub-class-of type="text/plain"/>
> </mime-type>
> {code}
> This recognition will fail, if VERSION:2.0 is not the first property after
> BEGIN:VCALENDAR.
> Since this is not always the case (check:
> [https://tools.ietf.org/html/rfc5545|https://tools.ietf.org/html/rfc5545]
> 3.6. Calendar Components) recognition may fail for calendar objects with
> PRODID or other properties:
> Section "4. iCalendar Object Examples" shows some of these cases:
> {code}
> BEGIN:VCALENDAR
> PRODID:-//xyz Corp//NONSGML PDA Calendar Version 1.0//EN
> VERSION:2.0
> BEGIN:VEVENT
> DTSTAMP:19960704T120000Z
> UID:[email protected]
> ORGANIZER:mailto:[email protected]
> DTSTART:19960918T143000Z
> DTEND:19960920T220000Z
> STATUS:CONFIRMED
> CATEGORIES:CONFERENCE
> SUMMARY:Networld+Interop Conference
> DESCRIPTION:Networld+Interop Conference
> and Exhibit\nAtlanta World Congress Center\n
> Atlanta\, Georgia
> END:VEVENT
> END:VCALENDAR
> {code}
> or
> {code}
> BEGIN:VCALENDAR
> METHOD:xyz
> VERSION:2.0
> PRODID:-//ABC Corporation//NONSGML My Product//EN
> BEGIN:VEVENT
> DTSTAMP:19970324T120000Z
> SEQUENCE:0
> UID:[email protected]
> ORGANIZER:mailto:[email protected]
> ATTENDEE;RSVP=TRUE:mailto:[email protected]
> DTSTART:19970324T123000Z
> DTEND:19970324T210000Z
> CATEGORIES:MEETING,PROJECT
> CLASS:PUBLIC
> SUMMARY:Calendaring Interoperability Planning Meeting
> DESCRIPTION:Discuss how we can test c&s interoperability\n
> using iCalendar and other IETF standards.
> LOCATION:LDB Lobby
> ATTACH;FMTTYPE=application/postscript:ftp://example.com/pub/
> conf/bkgrnd.ps
> END:VEVENT
> END:VCALENDAR
> {code}
> I suggest to either
> a) widen the offset of the VERSION-match from 15:30 to 15:200 or sth. like
> that (not so good approach, since we don't know how Long the PRODID might be)
> or
> b) to add sub-matches for CALSCALE, PRODID, METHOD. (This might still not
> cover everything, since there are x-prop and iana-prop properties. For now I
> can only confirm that there are PRODID or METHOD as first property after
> BEGIN:VCALENDAR.)
> Regards
> Andreas
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)