I use Spark's SerializableWritable to wrap CombineFileSplit so I can pass around the splits. But I ran into Serialization issues. In researching why my code fails, I found that this might be a bug in CombineFileSplit:
CombineFileSplit doesn't serialize locations in write(DataOutput out) and deserialize locations in readFields(DataInput in). When I create a split in CombineFileInputFormat, locations is an array of String[0], but after deserialization (default contructor, then readFields), the locations will be null. This will lead NPE.