Hello Matteo,

Were you using the MapReduce ingest tool when you were running out of memory?  
If so, do you know big the file was that you were ingesting, how many 
containers Yarn allocated to your job, and how much memory was allocated to 
each container?  

Caleb A. Meier, Ph.D.
Senior Software Engineer ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
[email protected] ♦ www.parsons.com

-----Original Message-----
From: Matteo Cossu [mailto:[email protected]] 
Sent: Monday, August 28, 2017 8:04 PM
To: [email protected]
Subject: Re: Loading tab spaced data

I would like to help, but I still can't even test Rya properly. I'm developing 
for research a similar system (using Spark SQL) and I wanted to compare my 
software performances with Rya on the University Cluster.
When I try to use these Rya tools for loading the data with the big datasets, 
it always crashes (mostly out of memory problems) and it doesn't complete the 
loading. At the moment, I have the urgency of publishing some results, so I am 
comparing my software with other systems.
Later, I could go back on Rya and try to solve some bugs along the way :P

Best Regards,
Matteo Cossu

On 29 August 2017 at 01:29, Josh Elser <[email protected]> wrote:

> Hi Matteo,
>
> Thanks for the bug-report. Do you have an interest in making the 
> change to Rya to address this issue? :)
>
> In open source projects, we like to encourage users to make changes to 
> "scratch their own itch". Please let us know how we can help enable 
> you to make this change.
>
> On 8/25/17 8:45 AM, Matteo Cossu wrote:
>
>> Hello,
>> I have some problems in loading the data with the Map Reduce code 
>> provided.
>> I am using this class: 
>> *org.apache.rya.accumulo.mr.tools.RdfFileInputTool
>> .*
>> When my input data is in N-Triples format and the triples are tab 
>> separated instead of spaces, I get this error:
>>
>> *org.openrdf.rio.RDFParseException: Expected '<', found: m* I solved 
>> by substituting all the tabs with spaces in my input data, but since 
>> tabs are a possible separator in the N-Triples format, I think this 
>> should be implemented (or fixed) directly within the tool.
>>
>> Kind Regards,
>> Matteo Cossu
>>
>>

Reply via email to