Hi Siddhant, It looks like the python lxml package wraps the libxml2 parser. Julia also has a package that wraps libxm2: https://github.com/lindahua/LightXML.jl. Not sure if the recoverable parsing needed by html is exposed yet, but that should not be too difficult to add, if necessary.
Regards - Avik On Tuesday, 1 April 2014 07:13:34 UTC+1, Siddhant Jain wrote: > > As suggested by a few users on the IRC channel, I have added a function to > query the UCI ML repository website and return a list of names of currently > available datasets and their default task. > > You can find the updated package here: > https://github.com/siddhantjain/UCIMLRepo.jl > > After cloning the package, please use the function: ucirepolist() > for listing all the datasets available. > > However, I am not very happy with the speed of the function. Due to > unavailability of html parsers in Julia, I had to call python modules > (namely. lxml) which I believe is retarding the function. Any ideas for a > workaround here? > I will be really grateful if you could have a look at the code and suggest > other changes to improve on the speed of the function. > > >
