Re: HttpClient 4.0 encoding madness

amoldavsky Fri, 29 Jan 2010 11:30:15 -0800

Hi Oleg,

Let me rephrase the question in better terms: 
If the server document is Y and buffer size is X, let's even assume that Y =
kX where X < Y, is it possible that any buffer 0 < x < (k-1) will not be
fully filled?


Thanks!
-Assaf


Ken Krugler wrote:
> 
> 
> On Jan 28, 2010, at 10:09pm, amoldavsky wrote:
> 
>>
>> Hi Oleg,
>> Thank you for the quick reply.
>>
>> So if there is a possibility that not the whole buffer is filled how  
>> can I
>> insure or force HttpClient to fill the whole buffer? Should I maybe  
>> avoid
>> Stream Readers all together?
> 
> If bufferSize is X, and the server document you're fetching has Y  
> bytes, then what do you mean by "force HttpClient to fill the whole  
> buffer"?
> 
> At a minimum, you'd want
> 
> int bytesRead = chunkedIns.read(tmp);
> if (bytesRead != -1) {
>     return new String(tmp, 0, bytesRead);
> }
> 
> But that also uses the platform default encoding for the character  
> set, which often won't be correct.
> 
> -- Ken
> 
>>
>> olegk wrote:
>>>
>>> On Wed, 2010-01-27 at 20:24 -0800, amoldavsky wrote:
>>>> Hi
>>>>
>>>> I have coded a simple file downloader using HttpClient 4.0.
>>>> It works fine but there is something wrong with the String  
>>>> encoding or
>>>> the
>>>> buffer stream. The problem is that there are long sequences of  
>>>> "NULL"
>>>> (ANSI
>>>> code 00) through out the final file, like this:
>>>> http://old.nabble.com/file/p27350930/httpclient_error01.jpg
>>>> http://old.nabble.com/file/p27350930/httpclient_error02.jpg
>>>>
>>>> Here is the main code:
>>>>
>>>> public String getChunk(String url, int bufferSize) throws
>>>> HTTPClientException
>>>>  {
>>>>    if(!chunkedStarted)
>>>>    {
>>>>      chunkedIns = getInputStream(url);
>>>>      chunkedStarted = true;
>>>>    }
>>>>
>>>>    byte[] tmp = new byte[bufferSize];
>>>>    try
>>>>    {
>>>>      if(chunkedIns.read(tmp) != -1)
>>>>      {
>>>
>>> What makes you think that the entire buffer will be filled with data?
>>>
>>> Oleg
>>>
>>>
>>>>        return new String(tmp);
>>>>      }
>>>>      else
>>>>      {
>>>>        finish();
>>>>        return null;
>>>>      }
>>>>    }
>>>>    catch(IOException e)
>>>>    {
>>>>      HTTPClientException e2 = new  
>>>> HTTPClientException(e.getMessage());
>>>>      e2.setStackTrace(e.getStackTrace());
>>>>      throw e2;
>>>>    }
>>>>  }
>>>>
>>>>  public void finish()
>>>>  {
>>>>    // do some cleaning
>>>>  }
>>>>
>>>>   private InputStream getInputStream(String url) throws
>>>> HTTPClientException
>>>>  {
>>>>    InputStream instream = null;
>>>>
>>>>    httpClient = new DefaultHttpClient();
>>>>    httpClient.getParams().setParameter("http.useragent",  
>>>> AGENT_NAME);
>>>>
>>>>    HttpGet httpGet = new HttpGet(url);
>>>>    HttpResponse response = null;
>>>>
>>>>    try
>>>>    {
>>>>      response = httpClient.execute(httpGet);
>>>>      HttpEntity entity = response.getEntity();
>>>>
>>>>      if(entity != null)
>>>>      {
>>>>        instream = entity.getContent();
>>>>      }
>>>>    }
>>>>    catch(ClientProtocolException e)
>>>>    {
>>>>      HTTPClientException e2 = new  
>>>> HTTPClientException(e.getMessage());
>>>>      e2.setStackTrace(e.getStackTrace());
>>>>      throw e2;
>>>>    }
>>>>    catch(IOException e)
>>>>    {
>>>>      HTTPClientException e2 = new  
>>>> HTTPClientException(e.getMessage());
>>>>      e2.setStackTrace(e.getStackTrace());
>>>>      throw e2;
>>>>    }
>>>>
>>>>    return instream;
>>>>  }
>>>>
>>>> getChuck and getInputStream can basically be one method but I just  
>>>> have
>>>> the
>>>> need to split them for internal conveniece, that does not change the
>>>> funtionality as a whole.
>>>>
>>>> It seems like either the conversion from bytes to string is a  
>>>> problem:
>>>> return new String(tmp);
>>>>
>>>> or that the buffer is not getting filled to the end. The latter  
>>>> could not
>>>> be
>>>> possible because the files are ~30MB each and the buffer size is  
>>>> 2Kb.
>>>>
>>>> I have attached the file, it's a CSV (shortened to ~6KB), note  
>>>> that long
>>>> white space between some of the URLs, if you just remove it, the URL
>>>> makes
>>>> sense.
>>>> http://old.nabble.com/file/p27350930/datafeed.csv datafeed.csv
>>>>
>>>> Where can this white space come (null) from??
>>>>
>>>> thank!
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27366928.html
>> Sent from the HttpClient-User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
> 
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27377093.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: HttpClient 4.0 encoding madness

Reply via email to