[
https://issues.apache.org/jira/browse/GIRAPH-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558410#comment-13558410
]
Eli Reisman commented on GIRAPH-483:
------------------------------------
This is likely an area I'll be putting up some patches on to break all this
stuff out for pure YARN preparations, maybe I'll take this one. The interface
might be required to set up fabrication of our own input splits without relying
on Hadoop code to do it for us.
> InputSplit needs to be Writable
> -------------------------------
>
> Key: GIRAPH-483
> URL: https://issues.apache.org/jira/browse/GIRAPH-483
> Project: Giraph
> Issue Type: Improvement
> Reporter: Nitay Joffe
> Priority: Minor
>
> Working on Hive I/O recently I found this out the hard way...
> We use InputSplit in Giraph in order to make things work easily with Hadoop.
> However our usage of the interface is not actually consistent. Specifically,
> in InputSplitsCallable#getInputSplit we have the following:
> ((Writable) inputSplit).readFields(inputStream);
> This means our InputSplit has to be Writable. If it's not (as mine wasn't
> initially when implementing a new input format) things break badly. For a
> simple start we should at least put some instanceof check around that cast
> and an informative error message.
> Furthermore, looking deeper into it I noticed we don't actually ever use the
> getLength() method in InputSplit, just getLocations(). So really the "right"
> way to have things IMO is to have our own GiraphInputSplit interface, which
> extends Writable, and has the getLocations() method.
> Doing this is tricky though as it will likely break existing I/O formats, so
> will require some care...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira