A+1 for Erik.
I would like to add that solving this problem is realy realy easy.
IF your working from shell simply split the file using
split -l 1 file
This splits the file into the component JSON files. I will send you off-list
the zip I made from this.
You can then load each file individually as JSON using any of the ML tools that
handle json (including loading directly from xquery).
If you are working from Java its very easy also. If the file is small you can
tokenize it as a string and get an array of strings like
String[] Json_data = fullstring.split("\n");
for ( String json : json_data )
insertfile( json )
if you are working from a stream (either meaning of stream ... that is you
don't have all the data at once)
You can use the BufferedReader and read one line at a time. Each line will be
a separate json document
You can then keep reading forever or until EOF and never have to buffer up more
then one file.,
-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224
Cell: +1 812-630-7622
www.marklogic.com
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Erik Hennum
Sent: Wednesday, August 28, 2013 4:27 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] unable to load twitter data json document
Hi, Ashok:
A stream of JSON objects can't be stored as a single JSON document because it
isn't a valid single JSON data structure.
If you really had to do so, you might be able to turn the stream into a single
valid single JSON data structure by separating the JSON objects with commas and
wrapping the objects in a single JSON array, but that's not a good idea.
Many JSON documents will be much more useful for analysis rather than one big
one. For comparison, consider whether it would be more useful for analysis to
persist data as one big row or many rows or to store data in memory as one big
object or as many objects.
If you are working in Java, you might want to take a look at the JSON streaming
APIs. In particular, Jackson has a tutorial specifically about the twitter
stream:
http://www.cowtowncoder.com/blog/archives/2009/01/entry_132.html
While the tutorial implements databinding for Java objects, you shouldn't have
to do so. You could write the JSON tokens for each JSON object to MarkLogic as
a separate JSON document. Or, if you prefer the databinding approach, you
could persist each bean via JAXB as a separate XML document -- it's just a data
structure.
Erik Hennum
________________________________________
From: [email protected]
[[email protected]] on behalf of ashokkumar
[[email protected]]
Sent: Tuesday, August 27, 2013 11:41 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] unable to load twitter data json
document
so if the file is not spiltted into constituent JSON files we cannot able to
load??so this means that marklogic doesnot jstify working with twitter data.in
this case the file is small.what if their is a big file??
On 08/28/2013 11:48 AM, David Lee wrote:
> How did you test this file? It is not valid JSON exactly As I
> suspected it is not one JSON file it is many (common for feed APIs)
>
> You need to split this file into its constituent JSON files ...
> Typically by creating a new file every line break ... The unix split
> command can do this
>
>
> Sent from my iPad (excuse the terseness) David A Lee
> [email protected]
> 812-630-7622
>
>
> On Aug 27, 2013, at 11:03 PM, "ashokkumar"<[email protected]> wrote:
>
>> i am using ML 6,i checked the format of the json file was correct below i am
>> attaching the zip of the file.
>> please check it.
>>
>>
>> How
>>
>>
>>
>> On 08/26/2013 08:13 PM, David Lee wrote:
>>> What version of ML is this ?
>>> Could you zip up the full file ?
>>> If this is twitter data its likely a lot of JSON data concatenated (with
>>> NL) ...
>>> You need to split this up and send each document separately.
>>>
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Erik
>>> Hennum
>>> Sent: Monday, August 26, 2013 7:19 AM
>>> To: MarkLogic Developer Discussion
>>> Subject: Re: [MarkLogic Dev General] unable to load twitter data
>>> json document
>>>
>>> Hi, Ashok:
>>>
>>> It appears that the server's JSON parser can't parse the file at line 2.
>>>
>>> Can you see any problems or, if not, post the first 3 lines of the file?
>>>
>>>
>>> Erik Hennum
>>>
>>> ________________________________________
>>> From: [email protected]
>>> [[email protected]] on behalf of ashokkumar
>>> [[email protected]]
>>> Sent: Monday, August 26, 2013 6:56 AM
>>> To: [email protected]
>>> Subject: [MarkLogic Dev General] unable to load twitter data json
>>> document
>>>
>>> Hi all,
>>>
>>> i have some sample twitter data file of 600kb of json type.when iam loading
>>> this json file in to marklogic through java api its showing error msg:
>>> Exception in thread "main" com.marklogic.client.FailedRequestException:
>>> Local message: write failed: Bad Request. Server Message: XDMP-JSONCHAR:
>>> xdmp:from-json("{"delete":{"status":{"id&qu
>>> ot;:33296634...")
>>> -- Unexpected character: { on line 2
>>> at
>>> com.marklogic.client.impl.JerseyServices.putDocumentImpl(JerseyServices.java:839)
>>> at
>>> com.marklogic.client.impl.JerseyServices.putDocument(JerseyServices.java:740)
>>> at
>>> com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:336)
>>> at
>>> com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:271)
>>> at
>>> com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:231)
>>> at MlWrite.run(MlWrite.java:47)
>>> at MlWrite.main(MlWrite.java:18)
>>>
>>> Please kindly can any one help.
>>>
>>>
>>> Thanks
>>> Ashok kumar
>>> hadoop developer.
>>> =====-----=====-----=====
>>> Notice: The information contained in this e-mail message and/or
>>> attachments to it may contain confidential or privileged
>>> information. If you are not the intended recipient, any
>>> dissemination, use, review, distribution, printing or copying of the
>>> information contained in this e-mail message and/or attachments to
>>> it are strictly prohibited. If you have received this communication
>>> in error, please notify us by reply e-mail or telephone and
>>> immediately and permanently delete the message and any attachments.
>>> Thank you
>>>
>>>
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>> <twitterData.json.zip>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general