Re: [julia-users] How to html parse ?

Stefan Karpinski Mon, 02 Mar 2015 07:50:44 -0800

James Porter has an HTML parsing package based on Google's Gumbo library:

https://github.com/porterjamesj/Gumbo.jl

However, what's happening here is that you're getting a 302 moved response
and the content is just a renderable version of that. What you actually
want to do is look at the response code and headers to figure out what page
to actually get and parse as JSON. Or you could use curl with the
appropriate options and it will do this for you.

On Sun, Mar 1, 2015 at 4:07 PM, Paul Analyst <[email protected]> wrote:

>  Do You now any HTML parser for Julia who do it ?
> Paul
> W dniu 2015-03-01 o 22:04, Jameson Nash pisze:
>
> That page isn't a JSON formatted document. Perhaps you were looking for a
> HTML parser?
>
> On Sun, Mar 1, 2015 at 4:02 PM paul analyst <[email protected]> wrote:
>
>> For some pages JSON parser is OK, but most pages  no, like below.
>> How to html parse ?
>>
>> julia> get("http://rp.pl";).data
>> "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML
>> 2.0//EN\">\n<html><head>\n<title>302 Found</title>\n</head><bod
>> Found</h1>\n<p>The document has moved <a href=\"http://www4.rp.pl/\
>> ">here</a>.</p>\n</body></html>\n"
>>
>> julia> JSON.parse(get("http://rp.pl";).data)
>> ERROR: Unknown value
>> Line: 0
>> Around: ...<!DOCTYPE HTML PUBLIC...
>>            ^
>>
>>  in error at error.jl:19
>>
>> julia>
>>
>>
>> Paul
>>
>>
>

Re: [julia-users] How to html parse ?

Reply via email to