Re: [NTG-context] Downloading long urls

2011-01-23 Thread Hans Hagen

On 22-1-2011 1:20, Aditya Mahajan wrote:


A simple algorithm with assume that everything following the dot is the
extension, while that is certainly not the case here. We can definitely
restrict the search of extension to the last 10 or so characters of the
url, but there will be cases when such heuristics will fail.


it's not that complicated ... say that you patch this way:

function schemes.cleanname(specification)
return (gsub(specification.original,[^%a%d%.]+,-))
end

local function fetch(specification)
local original  = specification.original
local scheme= specification.scheme
local cleanname = schemes.cleanname(specification)

that will be the current method. Now you can experiment with:

\startluacode
function resolvers.schemes.cleanname(specification)
local name = specification.original
local hash = 
file.addsuffix(md5.hex(name),file.suffix(specification.path))

logs.simple(%s = %s,name,hash)
return hash
end
\stopluacode

Just see how that works out

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Downloading long urls

2011-01-23 Thread Aditya Mahajan

On Sun, 23 Jan 2011, Hans Hagen wrote:


On 22-1-2011 1:20, Aditya Mahajan wrote:


A simple algorithm with assume that everything following the dot is the
extension, while that is certainly not the case here. We can definitely
restrict the search of extension to the last 10 or so characters of the
url, but there will be cases when such heuristics will fail.


it's not that complicated ... say that you patch this way:

function schemes.cleanname(specification)
   return (gsub(specification.original,[^%a%d%.]+,-))
end

local function fetch(specification)
   local original  = specification.original
   local scheme= specification.scheme
   local cleanname = schemes.cleanname(specification)

that will be the current method. Now you can experiment with:


Can cleanname be passed as a parameter of the specification? Then we can 
have


local cleanname = specification.cleanname or schemes.cleanname(specification)

This way, I can only change the cleanname of the files that are downloaded 
by my module without affecting the cleanname for any other command that 
might want to download a file.


Aditya
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Downloading long urls

2011-01-23 Thread Hans Hagen

On 23-1-2011 9:34, Aditya Mahajan wrote:

On Sun, 23 Jan 2011, Hans Hagen wrote:


On 22-1-2011 1:20, Aditya Mahajan wrote:


A simple algorithm with assume that everything following the dot is the
extension, while that is certainly not the case here. We can definitely
restrict the search of extension to the last 10 or so characters of the
url, but there will be cases when such heuristics will fail.


it's not that complicated ... say that you patch this way:

function schemes.cleanname(specification)
return (gsub(specification.original,[^%a%d%.]+,-))
end

local function fetch(specification)
local original = specification.original
local scheme = specification.scheme
local cleanname = schemes.cleanname(specification)

that will be the current method. Now you can experiment with:


Can cleanname be passed as a parameter of the specification? Then we can
have

local cleanname = specification.cleanname or
schemes.cleanname(specification)

This way, I can only change the cleanname of the files that are
downloaded by my module without affecting the cleanname for any other
command that might want to download a file.


I made this ... as this is rather specialized tuning (that might confuse 
users) it's a directive:


\starttext

\enabletrackers [resolvers.schemes]
\enabledirectives[schemes.cleanmethod=md5]

\externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm]
\externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm]
\externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm]

\stoptext

currently 'strip' is default but we can decide on md5

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Downloading long urls

2011-01-23 Thread Aditya Mahajan

On Sun, 23 Jan 2011, Hans Hagen wrote:

I made this ... as this is rather specialized tuning (that might confuse 
users) it's a directive:


\starttext

\enabletrackers [resolvers.schemes]
\enabledirectives[schemes.cleanmethod=md5]

\externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm]
\externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm]
\externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png][width=3cm]

\stoptext

currently 'strip' is default but we can decide on md5


Thanks. I'll test it with my module.

Aditya
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Downloading long urls

2011-01-21 Thread Aditya Mahajan

On Sun, 16 Jan 2011, Aditya Mahajan wrote:

Is there a robust way to avoid this problem? One possibility is that in 
data-sch.lua instead of


   local cleanname = gsub(original,[^%a%d%.]+,-)

use


local cleanname = md5.HEX(original) -- gsub(original,[^%a%d%.]+,-)

appears to work correctly in my tests. The drawback of this scheme is that 
instead of


   \externalfigure[url ending with .png]

one would have to use

   \externalfigure[url ending with .png][method=png]

But \input 'url ending with .tex' still works

The other drawback is the filenames in the cache will be gibberish. But on 
the plus side, you can use long urls.


Do you think that the drawbacks outweigh the gains?

I need this for the webfilter module, where the url can get pretty long. I 
can always write my own http_get function, but that will be mostly 
repetition of data-sch.lua


Aditya
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Downloading long urls

2011-01-21 Thread Hans Hagen

On 21-1-2011 6:15, Aditya Mahajan wrote:


local cleanname = md5.HEX(original) -- gsub(original,[^%a%d%.]+,-)

appears to work correctly in my tests. The drawback of this scheme is
that instead of

\externalfigure[url ending with .png]

one would have to use

\externalfigure[url ending with .png][method=png]

But \input 'url ending with .tex' still works

The other drawback is the filenames in the cache will be gibberish. But
on the plus side, you can use long urls.

Do you think that the drawbacks outweigh the gains?


What exactly do you mean with the suffix issue? We can probably 
normalize things a bit. Concerning the gibberish ... we can put a file 
alongside with some info. I need to think a bit about it but indeed it 
makes no sense to have redundant mechanisms.


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] Downloading long urls

2011-01-21 Thread Aditya Mahajan

On Fri, 21 Jan 2011, Hans Hagen wrote:


What exactly do you mean with the suffix issue?


Consider

 
\externalfigure[http://contextgarden.files.wordpress.com/2008/08/logo-alt41.png]

The current implementation downloads this file as

path-to-current-cache/http-contextgarden.files.wordpress.com-2008-08-logo-alt41.png

Then external figure sees a file with .png extension, and correctly 
includes it.


If you follow my suggestion, the file will be downloaded as

path-to-current-cache/667816068B899068327DA1EF013B3943

Then external figure sees a file with no extension, assumes that the file 
is a pdf file, and the figure inclusion fails. To correct that, you need 
to add [method=png] to \externalfigure.



We can probably normalize things a bit.


Agreed. Perhaps the best option will be a file name like

http-contextgardent.files.wordpress.com-667816068B899068327DA1EF013B3943.png

(so normalized base url + md5sum of url + extension). I am not sure how if 
extensions can be calculated reliably in urls. In particular imaging 
something like


http://www.bing.com/search?q=check+.extension+long+url+so+that+os+filename+limit+exceeds+

A simple algorithm with assume that everything following the dot is the 
extension, while that is certainly not the case here. We can definitely 
restrict the search of extension to the last 10 or so characters of the 
url, but there will be cases when such heuristics will fail.


Concerning the gibberish ... we can put a file alongside with some info. 
I need to think a bit about it but indeed it makes no sense to have 
redundant mechanisms.


Thanks,
Aditya
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


[NTG-context] Downloading long urls

2011-01-15 Thread Aditya Mahajan

Hi,

While downloading urls, context santizes the filename but does not check 
the length of the url. So, one can end up with a situation where the 
filename is too long for the operating system to handle. For example, the 
following fails on 32bit linux.


\enabletrackers[resolvers.schemes]
\startluacode
  local report_webfilter = logs.new(thirddata.webfilter)

  local url = 
http://www.bing.com/search?q=AreallyreallylongstringjusttoseehowthingsworkordontworkAreallyreallylongstringjusttoseehowthingsworkordontworkAreallyreallylongstringjusttoseehowthingsworkordontworkAreallyreallylongstringjusttoseehowthingsworkordontworAreallyreallylongstringjusttoseehowthingsworkordontworkkAreallyreallylongstringjusttoseehowthingsworkordontwork;


  local specification = resolvers.splitmethod(url)

  local file   = resolvers.finders['http'](specification) or 

  if file and file ~=  then
report_webfilter(saving file %s, file)
  else
report_webfilter(download failed)
  end
\stopluacode

\normalend

Is there a robust way to avoid this problem? One possibility is that in 
data-sch.lua instead of


local cleanname = gsub(original,[^%a%d%.]+,-)

use

local cleanname = md5sum(original)

What do you think?

Aditya

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___