Hi, 

Why is it so hard to get the target URL of a redirect? I have to get the 
protocolstatus out of the crawl datum's metadata and then get the first arg of 
ProtocolStatus' args? 

Can it have more than 1 arg? Is there a decent method to get the URL? At first 
i assumed _repr_ key would return the target URL but that key doesn't seem to 
exist for some test redirects i have.

ProtocolStatus z = 
(ProtocolStatus)crawlDatum.getMetaData().get(Nutch.WRITABLE_PROTO_STATUS_KEY);
String[] args = z.getArgs();
String targetUrl = args[0];

Any hints? Can i rely on the assumption that arg[0] is always the crawldatum's 
target URL?

Thanks,
-- 
Markus Jelsma - CTO - Openindex

Reply via email to