Re: [NTG-context] Downloading long urls
On 22-1-2011 1:20, Aditya Mahajan wrote: A simple algorithm with assume that everything following the dot is the extension, while that is certainly not the case here. We can definitely restrict the search of extension to the last 10 or so characters of the url, but there will be cases when such heuristics will fail. it's not that complicated ... say that you patch this way: function schemes.cleanname(specification) return (gsub(specification.original,[^%a%d%.]+,-)) end local function fetch(specification) local original = specification.original local scheme= specification.scheme local cleanname = schemes.cleanname(specification) that will be the current method. Now you can experiment with: \startluacode function resolvers.schemes.cleanname(specification) local name = specification.original local hash = file.addsuffix(md5.hex(name),file.suffix(specification.path)) logs.simple(%s = %s,name,hash) return hash end \stopluacode Just see how that works out Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Downloading long urls
On Sun, 23 Jan 2011, Hans Hagen wrote: On 22-1-2011 1:20, Aditya Mahajan wrote: A simple algorithm with assume that everything following the dot is the extension, while that is certainly not the case here. We can definitely restrict the search of extension to the last 10 or so characters of the url, but there will be cases when such heuristics will fail. it's not that complicated ... say that you patch this way: function schemes.cleanname(specification) return (gsub(specification.original,[^%a%d%.]+,-)) end local function fetch(specification) local original = specification.original local scheme= specification.scheme local cleanname = schemes.cleanname(specification) that will be the current method. Now you can experiment with: Can cleanname be passed as a parameter of the specification? Then we can have local cleanname = specification.cleanname or schemes.cleanname(specification) This way, I can only change the cleanname of the files that are downloaded by my module without affecting the cleanname for any other command that might want to download a file. Aditya ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Downloading long urls
On 23-1-2011 9:34, Aditya Mahajan wrote: On Sun, 23 Jan 2011, Hans Hagen wrote: On 22-1-2011 1:20, Aditya Mahajan wrote: A simple algorithm with assume that everything following the dot is the extension, while that is certainly not the case here. We can definitely restrict the search of extension to the last 10 or so characters of the url, but there will be cases when such heuristics will fail. it's not that complicated ... say that you patch this way: function schemes.cleanname(specification) return (gsub(specification.original,[^%a%d%.]+,-)) end local function fetch(specification) local original = specification.original local scheme = specification.scheme local cleanname = schemes.cleanname(specification) that will be the current method. Now you can experiment with: Can cleanname be passed as a parameter of the specification? Then we can have local cleanname = specification.cleanname or schemes.cleanname(specification) This way, I can only change the cleanname of the files that are downloaded by my module without affecting the cleanname for any other command that might want to download a file. I made this ... as this is rather specialized tuning (that might confuse users) it's a directive: \starttext \enabletrackers [resolvers.schemes] \enabledirectives[schemes.cleanmethod=md5] \externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm] \externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm] \externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm] \stoptext currently 'strip' is default but we can decide on md5 Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Downloading long urls
On Sun, 23 Jan 2011, Hans Hagen wrote: I made this ... as this is rather specialized tuning (that might confuse users) it's a directive: \starttext \enabletrackers [resolvers.schemes] \enabledirectives[schemes.cleanmethod=md5] \externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm] \externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm] \externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm] \stoptext currently 'strip' is default but we can decide on md5 Thanks. I'll test it with my module. Aditya ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Downloading long urls
On Sun, 16 Jan 2011, Aditya Mahajan wrote: Is there a robust way to avoid this problem? One possibility is that in data-sch.lua instead of local cleanname = gsub(original,[^%a%d%.]+,-) use local cleanname = md5.HEX(original) -- gsub(original,[^%a%d%.]+,-) appears to work correctly in my tests. The drawback of this scheme is that instead of \externalfigure[url ending with .png] one would have to use \externalfigure[url ending with .png][method=png] But \input 'url ending with .tex' still works The other drawback is the filenames in the cache will be gibberish. But on the plus side, you can use long urls. Do you think that the drawbacks outweigh the gains? I need this for the webfilter module, where the url can get pretty long. I can always write my own http_get function, but that will be mostly repetition of data-sch.lua Aditya ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Downloading long urls
On 21-1-2011 6:15, Aditya Mahajan wrote: local cleanname = md5.HEX(original) -- gsub(original,[^%a%d%.]+,-) appears to work correctly in my tests. The drawback of this scheme is that instead of \externalfigure[url ending with .png] one would have to use \externalfigure[url ending with .png][method=png] But \input 'url ending with .tex' still works The other drawback is the filenames in the cache will be gibberish. But on the plus side, you can use long urls. Do you think that the drawbacks outweigh the gains? What exactly do you mean with the suffix issue? We can probably normalize things a bit. Concerning the gibberish ... we can put a file alongside with some info. I need to think a bit about it but indeed it makes no sense to have redundant mechanisms. Hans - Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl - ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
Re: [NTG-context] Downloading long urls
On Fri, 21 Jan 2011, Hans Hagen wrote: What exactly do you mean with the suffix issue? Consider \externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png] The current implementation downloads this file as path-to-current-cache/http-contextgarden.files.wordpress.com-2008-08-logo-alt41.png Then external figure sees a file with .png extension, and correctly includes it. If you follow my suggestion, the file will be downloaded as path-to-current-cache/667816068B899068327DA1EF013B3943 Then external figure sees a file with no extension, assumes that the file is a pdf file, and the figure inclusion fails. To correct that, you need to add [method=png] to \externalfigure. We can probably normalize things a bit. Agreed. Perhaps the best option will be a file name like http-contextgardent.files.wordpress.com-667816068B899068327DA1EF013B3943.png (so normalized base url + md5sum of url + extension). I am not sure how if extensions can be calculated reliably in urls. In particular imaging something like http://www.bing.com/search?q=check+.extension+long+url+so+that+os+filename+limit+exceeds+ A simple algorithm with assume that everything following the dot is the extension, while that is certainly not the case here. We can definitely restrict the search of extension to the last 10 or so characters of the url, but there will be cases when such heuristics will fail. Concerning the gibberish ... we can put a file alongside with some info. I need to think a bit about it but indeed it makes no sense to have redundant mechanisms. Thanks, Aditya ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___
[NTG-context] Downloading long urls
Hi, While downloading urls, context santizes the filename but does not check the length of the url. So, one can end up with a situation where the filename is too long for the operating system to handle. For example, the following fails on 32bit linux. \enabletrackers[resolvers.schemes] \startluacode local report_webfilter = logs.new(thirddata.webfilter) local url = http://www.bing.com/search?q=AreallyreallylongstringjusttoseehowthingsworkordontworkAreallyreallylongstringjusttoseehowthingsworkordontworkAreallyreallylongstringjusttoseehowthingsworkordontworkAreallyreallylongstringjusttoseehowthingsworkordontworAreallyreallylongstringjusttoseehowthingsworkordontworkkAreallyreallylongstringjusttoseehowthingsworkordontwork; local specification = resolvers.splitmethod(url) local file = resolvers.finders['http'](specification) or if file and file ~= then report_webfilter(saving file %s, file) else report_webfilter(download failed) end \stopluacode \normalend Is there a robust way to avoid this problem? One possibility is that in data-sch.lua instead of local cleanname = gsub(original,[^%a%d%.]+,-) use local cleanname = md5sum(original) What do you think? Aditya ___ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___