Re: Loading tab spaced data

Puja Valiyil Tue, 29 Aug 2017 01:19:23 -0700

Hi Matteo,
Rya delegates parsing of input rdf files to the  rdf parsers provided by 
sesame/openrdf.  So the issue is due to a bug with the openrdf/sesame parser,  
it looks like the parser doesn't like tabs.  Upgrading rya to the latest 
release if open rdf might solve the issue.  Aaron has brought that up as 
something to do-- no one has started work on it though since it would mean 
several non trivial changes.  
Hope this helps!


Sent from my iPhone

> On Aug 28, 2017, at 8:03 PM, Matteo Cossu <[email protected]> wrote:
> 
> I would like to help, but I still can't even test Rya properly. I'm
> developing for research a similar system (using Spark SQL) and I wanted to
> compare my software performances with Rya on the University Cluster.
> When I try to use these Rya tools for loading the data with the big
> datasets, it always crashes (mostly out of memory problems) and it doesn't
> complete the loading. At the moment, I have the urgency of publishing some
> results, so I am comparing my software with other systems.
> Later, I could go back on Rya and try to solve some bugs along the way :P
> 
> Best Regards,
> Matteo Cossu
> 
>> On 29 August 2017 at 01:29, Josh Elser <[email protected]> wrote:
>> 
>> Hi Matteo,
>> 
>> Thanks for the bug-report. Do you have an interest in making the change to
>> Rya to address this issue? :)
>> 
>> In open source projects, we like to encourage users to make changes to
>> "scratch their own itch". Please let us know how we can help enable you to
>> make this change.
>> 
>>> On 8/25/17 8:45 AM, Matteo Cossu wrote:
>>> 
>>> Hello,
>>> I have some problems in loading the data with the Map Reduce code
>>> provided.
>>> I am using this class: *org.apache.rya.accumulo.mr.tools.RdfFileInputTool
>>> .*
>>> When my input data is in N-Triples format and the triples are tab
>>> separated
>>> instead of spaces, I get this error:
>>> 
>>> *org.openrdf.rio.RDFParseException: Expected '<', found: m*
>>> I solved by substituting all the tabs with spaces in my input data, but
>>> since tabs are a possible separator in the N-Triples format, I think this
>>> should be implemented (or fixed) directly within the tool.
>>> 
>>> Kind Regards,
>>> Matteo Cossu
>>> 
>>>

Re: Loading tab spaced data

Reply via email to