[
https://issues.apache.org/jira/browse/MAHOUT-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144856#comment-13144856
]
Joe Prasanna Kumar commented on MAHOUT-155:
-------------------------------------------
After adding few more test data related to date format, I encountered some
interesting issues.
1. When the name of the attribute starts with any of the data types like say
"dateOfFirstPurchase" then the Iterator was considering this as date type and
tries to create a date out of "OfFirstPurchase". I've modified the
ARFFVectorIterable and ARFFType to fix this.
2. If there was a commma in a date / String data, then it was considered as a
data on its own. For eg, "0:08 PM, PDT" was treated as 2 strings "0:08 PM" as
one and "PDT" as the second. In ARFFIterator, I've added modified COMMA_PATTERN
to be ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" This does a split on the comma only if
that comma has zero, or an even number of quotes in ahead of it. Credit for
this regex pattern goes to an answer in stackoverflow.
I have modified the test case for few more date formats and they all seem to
work now.
The patch has been updated in this task. After formatting the code using the
template available in
https://cwiki.apache.org/MAHOUT/how-to-contribute.data/Mahout-Eclipse-Codeformatter.xml
, the diff seems to be quite a lot.
Please test with this patch and if it all looks good maybe we can close this
issue.
Joe.
> ARFF VectorIterable
> -------------------
>
> Key: MAHOUT-155
> URL: https://issues.apache.org/jira/browse/MAHOUT-155
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Labels: MAHOUT_INTRO_CONTRIBUTE
> Attachments: MAHOUT-155-DateTestAndFix.patch, MAHOUT-155.patch
>
>
> Convert ARFF to Vector. See http://www.cs.waikato.ac.nz/~ml/weka/arff.html
> Create a VectorIterable implementation for ARFF.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira