Oleg Kalnichevski wrote:
We have already had a few reports regarding IBM JSSE semantical
incompatibilities with Sun JSSE. It appears IBM JSSE implementation
unlike Sun's does not like attempts to set socket parameters when the
socket is closed. I believe it is clearly a bug in IBM JSSE but we can
think of working it around in HttpClient.
That'd be grand if its possible (We like the IBM JVMs' speed and more
detailed thread dumps). We used to subclass httpclient so we could do
the below, moving the setting of the timeout till after the open.
HttpClient 3.0 now sets timeout, etc., after the open seemingly so our
subclass is no longer necessary (Hurray!).
// HERITRIX: Moved this timeout to after connection.open.
// connection.setSoTimeout(soTimeout);
if (!connection.isOpen()) {
connection.setConnectionTimeout(connectionTimeout);
connection.open();
// HERITRIX: Move socket timeout here. It used to be done
// before connection.open.
connection.setSoTimeout(soTimeout);
...
+ inputStream = httpRecorder.inputWrap((InputStream)
+ (new BufferedInputStream(socket.getInputStream(),
+ inbuffersize)));
+ outputStream = httpRecorder.outputWrap((OutputStream)
+ (new BufferedOutputStream(socket.getOutputStream(),
+ outbuffersize)));
+ }
+ // END HERITRIX change.
+
What does exactly httpRecorder do? Probably we could think of a less
intrusive way of getting the same thing done.
HttpRecorder duplicates all sent and received to files on disk. It wraps the
(buffered) socket streams with input/output streams that do the duplication.
Subsequently, the file is fed to a set of processors to with as they wilt. Link
extraction is main task performed by processors.
We need to record what was sent over the wire preserving order and all bytes sent back
and forth (We're trying to archive the web). If there's a less intrusive way of
getting what we need, we'd love to hear of it.
...
+ value = new StringBuffer(line.substring(colon +
1).trim()); }
- name = line.substring(0, colon).trim();
- value = new StringBuffer(line.substring(colon +
1).trim()); + // END HERITRIX change.
}
This is a known problem. Basically it appears there's no one right way
to parse HTTP status line and headers that fits all type of
applications. Our plan is to provide a plugin mechanism for custom HTTP
parsers in the version 4
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25468
Sounds good.
Thanks again for the software.
Yours,
St.Ack
Cheers,
Oleg
***************************************************************************************************
The information in this email is confidential and may be legally privileged. Access
to this email by anyone other than the intended addressee is unauthorized. If you are
not the intended recipient of this message, any review, disclosure, copying,
distribution, retention, or any action taken or omitted to be taken in reliance on it
is prohibited and may be unlawful. If you are not the intended recipient, please
reply to or forward a copy of this message to the sender and delete the message, any
attachments, and any copies thereof from your system.
***************************************************************************************************
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]