Hi Oleg,
Let me rephrase the question in better terms:
If the server document is Y and buffer size is X, let's even assume that Y =
kX where X < Y, is it possible that any buffer 0 < x < (k-1) will not be
fully filled?
Thanks!
-Assaf
Ken Krugler wrote:
>
>
> On Jan 28, 2010, at 10:09pm, amoldavsky wrote:
>
>>
>> Hi Oleg,
>> Thank you for the quick reply.
>>
>> So if there is a possibility that not the whole buffer is filled how
>> can I
>> insure or force HttpClient to fill the whole buffer? Should I maybe
>> avoid
>> Stream Readers all together?
>
> If bufferSize is X, and the server document you're fetching has Y
> bytes, then what do you mean by "force HttpClient to fill the whole
> buffer"?
>
> At a minimum, you'd want
>
> int bytesRead = chunkedIns.read(tmp);
> if (bytesRead != -1) {
> return new String(tmp, 0, bytesRead);
> }
>
> But that also uses the platform default encoding for the character
> set, which often won't be correct.
>
> -- Ken
>
>>
>> olegk wrote:
>>>
>>> On Wed, 2010-01-27 at 20:24 -0800, amoldavsky wrote:
>>>> Hi
>>>>
>>>> I have coded a simple file downloader using HttpClient 4.0.
>>>> It works fine but there is something wrong with the String
>>>> encoding or
>>>> the
>>>> buffer stream. The problem is that there are long sequences of
>>>> "NULL"
>>>> (ANSI
>>>> code 00) through out the final file, like this:
>>>> http://old.nabble.com/file/p27350930/httpclient_error01.jpg
>>>> http://old.nabble.com/file/p27350930/httpclient_error02.jpg
>>>>
>>>> Here is the main code:
>>>>
>>>> public String getChunk(String url, int bufferSize) throws
>>>> HTTPClientException
>>>> {
>>>> if(!chunkedStarted)
>>>> {
>>>> chunkedIns = getInputStream(url);
>>>> chunkedStarted = true;
>>>> }
>>>>
>>>> byte[] tmp = new byte[bufferSize];
>>>> try
>>>> {
>>>> if(chunkedIns.read(tmp) != -1)
>>>> {
>>>
>>> What makes you think that the entire buffer will be filled with data?
>>>
>>> Oleg
>>>
>>>
>>>> return new String(tmp);
>>>> }
>>>> else
>>>> {
>>>> finish();
>>>> return null;
>>>> }
>>>> }
>>>> catch(IOException e)
>>>> {
>>>> HTTPClientException e2 = new
>>>> HTTPClientException(e.getMessage());
>>>> e2.setStackTrace(e.getStackTrace());
>>>> throw e2;
>>>> }
>>>> }
>>>>
>>>> public void finish()
>>>> {
>>>> // do some cleaning
>>>> }
>>>>
>>>> private InputStream getInputStream(String url) throws
>>>> HTTPClientException
>>>> {
>>>> InputStream instream = null;
>>>>
>>>> httpClient = new DefaultHttpClient();
>>>> httpClient.getParams().setParameter("http.useragent",
>>>> AGENT_NAME);
>>>>
>>>> HttpGet httpGet = new HttpGet(url);
>>>> HttpResponse response = null;
>>>>
>>>> try
>>>> {
>>>> response = httpClient.execute(httpGet);
>>>> HttpEntity entity = response.getEntity();
>>>>
>>>> if(entity != null)
>>>> {
>>>> instream = entity.getContent();
>>>> }
>>>> }
>>>> catch(ClientProtocolException e)
>>>> {
>>>> HTTPClientException e2 = new
>>>> HTTPClientException(e.getMessage());
>>>> e2.setStackTrace(e.getStackTrace());
>>>> throw e2;
>>>> }
>>>> catch(IOException e)
>>>> {
>>>> HTTPClientException e2 = new
>>>> HTTPClientException(e.getMessage());
>>>> e2.setStackTrace(e.getStackTrace());
>>>> throw e2;
>>>> }
>>>>
>>>> return instream;
>>>> }
>>>>
>>>> getChuck and getInputStream can basically be one method but I just
>>>> have
>>>> the
>>>> need to split them for internal conveniece, that does not change the
>>>> funtionality as a whole.
>>>>
>>>> It seems like either the conversion from bytes to string is a
>>>> problem:
>>>> return new String(tmp);
>>>>
>>>> or that the buffer is not getting filled to the end. The latter
>>>> could not
>>>> be
>>>> possible because the files are ~30MB each and the buffer size is
>>>> 2Kb.
>>>>
>>>> I have attached the file, it's a CSV (shortened to ~6KB), note
>>>> that long
>>>> white space between some of the URLs, if you just remove it, the URL
>>>> makes
>>>> sense.
>>>> http://old.nabble.com/file/p27350930/datafeed.csv datafeed.csv
>>>>
>>>> Where can this white space come (null) from??
>>>>
>>>> thank!
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27366928.html
>> Sent from the HttpClient-User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c w e b m i n i n g
>
>
>
>
>
>
--
View this message in context:
http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27377093.html
Sent from the HttpClient-User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]