[
https://issues.apache.org/jira/browse/CRUNCH-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Wills updated CRUNCH-97:
-----------------------------
Attachment: CRUNCH-97v4.patch
[~mafr] my interpretation of the Tokenizer idea: I made the ScannerFactory into
a TokenizerFactory, where my Tokenizer is just a wrapper for a Scanner that
knows whether it should bypass certain fields when it is called. Ends up being
less typing for the default case (no need to specify indices for the
extractors) while still supporting your use case.
> Add helpers for parsing PCollection<String> instances
> -----------------------------------------------------
>
> Key: CRUNCH-97
> URL: https://issues.apache.org/jira/browse/CRUNCH-97
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Reporter: Josh Wills
> Assignee: Josh Wills
> Fix For: 0.5.0
>
> Attachments: CRUNCH-97.patch, CRUNCH-97-take2.patch,
> CRUNCH-97-Tokenizer-v1.patch, CRUNCH-97v3.patch, CRUNCH-97v4.patch
>
>
> We should make it a bit easier to parse delimited text files into specific
> data types (e.g., ints, floats, etc.) or combinations of types-- e.g., pairs
> of strings and ints, a Tuple3 of booleans, etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira