[julia-users] Re: [UCIMLRepo] New package for downloading datasets from UCI ML repositories

Avik Sengupta Tue, 01 Apr 2014 00:30:13 -0700

Hi Siddhant, 

It looks like the python lxml package wraps the libxml2 parser. Julia also 
has a package that wraps libxm2: https://github.com/lindahua/LightXML.jl. 
Not sure if the recoverable parsing needed by html is exposed yet, but that 
should not be too difficult to add, if necessary.


Regards
-
Avik

On Tuesday, 1 April 2014 07:13:34 UTC+1, Siddhant Jain wrote:
>
> As suggested by a few users on the IRC channel, I have added a function to 
> query the UCI ML repository website and return a list of names of currently 
> available datasets and their default task.
>
> You can find the updated package here: 
> https://github.com/siddhantjain/UCIMLRepo.jl
>
> After cloning the package, please use the function: ucirepolist() 
> for listing all the datasets available. 
>
> However, I am not very happy with the speed of the function. Due to 
> unavailability of html parsers in Julia, I had to call python modules  
> (namely. lxml) which I believe is retarding the function.  Any ideas for a 
> workaround here?
> I will be really grateful if you could have a look at the code and suggest 
> other changes to improve on the speed of the function.
>
>
>

[julia-users] Re: [UCIMLRepo] New package for downloading datasets from UCI ML repositories

Reply via email to