Hi,
This solution worked out very well:
byte[] tmp = new byte[bufferSize];
int bytesRead;
try
{
if((bytesRead = chunkedIns.read(tmp)) != -1)
{
return new String(tmp, 0, bytesRead);
}
else
{
finish();
return null;
}
}
catch(IOException e)
{
HTTPClientException e2 = new HTTPClientException(e.getMessage());
e2.setStackTrace(e.getStackTrace());
throw e2;
}
If it's not too much of a trouble would anybody please explain to me why is
it possible that the buffer maybe not be 100% full when I read it? I think
it's all depends on how the implementation was done (in this case by Sun),
and if Sun decided to implement buffering this way I don't understand the
logic behind it.
Thank you very much Oleg, Ken and Seb-2-2 for your earlier inputs!
sebb-2-2 wrote:
>
> On 29/01/2010, amoldavsky <[email protected]> wrote:
>>
>> Hi Oleg,
>>
>> Let me rephrase the question in better terms:
>> If the server document is Y and buffer size is X, let's even assume that
>> Y =
>> kX where X < Y, is it possible that any buffer 0 < x < (k-1) will not be
>> fully filled?
>
> Remember that HTTP packets may be broken up in transit.
>
> However, even without that, it's never safe to assume that a buffer is
> filled.
>
> That's what the return value from read(buffer) is for - it tells you
> how many bytes are available.
>
>> Thanks!
>> -Assaf
>>
>>
>>
>> Ken Krugler wrote:
>> >
>> >
>> > On Jan 28, 2010, at 10:09pm, amoldavsky wrote:
>> >
>> >>
>> >> Hi Oleg,
>> >> Thank you for the quick reply.
>> >>
>> >> So if there is a possibility that not the whole buffer is filled how
>> >> can I
>> >> insure or force HttpClient to fill the whole buffer? Should I maybe
>> >> avoid
>> >> Stream Readers all together?
>> >
>> > If bufferSize is X, and the server document you're fetching has Y
>> > bytes, then what do you mean by "force HttpClient to fill the whole
>> > buffer"?
>> >
>> > At a minimum, you'd want
>> >
>> > int bytesRead = chunkedIns.read(tmp);
>> > if (bytesRead != -1) {
>> > return new String(tmp, 0, bytesRead);
>> > }
>> >
>> > But that also uses the platform default encoding for the character
>> > set, which often won't be correct.
>> >
>> > -- Ken
>> >
>> >>
>> >> olegk wrote:
>> >>>
>> >>> On Wed, 2010-01-27 at 20:24 -0800, amoldavsky wrote:
>> >>>> Hi
>> >>>>
>> >>>> I have coded a simple file downloader using HttpClient 4.0.
>> >>>> It works fine but there is something wrong with the String
>> >>>> encoding or
>> >>>> the
>> >>>> buffer stream. The problem is that there are long sequences of
>> >>>> "NULL"
>> >>>> (ANSI
>> >>>> code 00) through out the final file, like this:
>> >>>> http://old.nabble.com/file/p27350930/httpclient_error01.jpg
>> >>>> http://old.nabble.com/file/p27350930/httpclient_error02.jpg
>> >>>>
>> >>>> Here is the main code:
>> >>>>
>> >>>> public String getChunk(String url, int bufferSize) throws
>> >>>> HTTPClientException
>> >>>> {
>> >>>> if(!chunkedStarted)
>> >>>> {
>> >>>> chunkedIns = getInputStream(url);
>> >>>> chunkedStarted = true;
>> >>>> }
>> >>>>
>> >>>> byte[] tmp = new byte[bufferSize];
>> >>>> try
>> >>>> {
>> >>>> if(chunkedIns.read(tmp) != -1)
>> >>>> {
>> >>>
>> >>> What makes you think that the entire buffer will be filled with
>> data?
>> >>>
>> >>> Oleg
>> >>>
>> >>>
>> >>>> return new String(tmp);
>> >>>> }
>> >>>> else
>> >>>> {
>> >>>> finish();
>> >>>> return null;
>> >>>> }
>> >>>> }
>> >>>> catch(IOException e)
>> >>>> {
>> >>>> HTTPClientException e2 = new
>> >>>> HTTPClientException(e.getMessage());
>> >>>> e2.setStackTrace(e.getStackTrace());
>> >>>> throw e2;
>> >>>> }
>> >>>> }
>> >>>>
>> >>>> public void finish()
>> >>>> {
>> >>>> // do some cleaning
>> >>>> }
>> >>>>
>> >>>> private InputStream getInputStream(String url) throws
>> >>>> HTTPClientException
>> >>>> {
>> >>>> InputStream instream = null;
>> >>>>
>> >>>> httpClient = new DefaultHttpClient();
>> >>>> httpClient.getParams().setParameter("http.useragent",
>> >>>> AGENT_NAME);
>> >>>>
>> >>>> HttpGet httpGet = new HttpGet(url);
>> >>>> HttpResponse response = null;
>> >>>>
>> >>>> try
>> >>>> {
>> >>>> response = httpClient.execute(httpGet);
>> >>>> HttpEntity entity = response.getEntity();
>> >>>>
>> >>>> if(entity != null)
>> >>>> {
>> >>>> instream = entity.getContent();
>> >>>> }
>> >>>> }
>> >>>> catch(ClientProtocolException e)
>> >>>> {
>> >>>> HTTPClientException e2 = new
>> >>>> HTTPClientException(e.getMessage());
>> >>>> e2.setStackTrace(e.getStackTrace());
>> >>>> throw e2;
>> >>>> }
>> >>>> catch(IOException e)
>> >>>> {
>> >>>> HTTPClientException e2 = new
>> >>>> HTTPClientException(e.getMessage());
>> >>>> e2.setStackTrace(e.getStackTrace());
>> >>>> throw e2;
>> >>>> }
>> >>>>
>> >>>> return instream;
>> >>>> }
>> >>>>
>> >>>> getChuck and getInputStream can basically be one method but I just
>> >>>> have
>> >>>> the
>> >>>> need to split them for internal conveniece, that does not change
>> the
>> >>>> funtionality as a whole.
>> >>>>
>> >>>> It seems like either the conversion from bytes to string is a
>> >>>> problem:
>> >>>> return new String(tmp);
>> >>>>
>> >>>> or that the buffer is not getting filled to the end. The latter
>> >>>> could not
>> >>>> be
>> >>>> possible because the files are ~30MB each and the buffer size is
>> >>>> 2Kb.
>> >>>>
>> >>>> I have attached the file, it's a CSV (shortened to ~6KB), note
>> >>>> that long
>> >>>> white space between some of the URLs, if you just remove it, the
>> URL
>> >>>> makes
>> >>>> sense.
>> >>>> http://old.nabble.com/file/p27350930/datafeed.csv datafeed.csv
>> >>>>
>> >>>> Where can this white space come (null) from??
>> >>>>
>> >>>> thank!
>> >>>
>> >>>
>> >>>
>> >>>
>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: [email protected]
>> >>> For additional commands, e-mail: [email protected]
>> >>>
>> >>>
>> >>>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27366928.html
>> >> Sent from the HttpClient-User mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >
>> > --------------------------------------------
>> > Ken Krugler
>> > +1 530-210-6378
>> > http://bixolabs.com
>> > e l a s t i c w e b m i n i n g
>> >
>> >
>> >
>> >
>> >
>> >
>>
>> --
>>
>> View this message in context:
>> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27377093.html
>>
>> Sent from the HttpClient-User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
>
--
View this message in context:
http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27381546.html
Sent from the HttpClient-User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]