Re: HttpClient 4.0 encoding madness

amoldavsky Fri, 29 Jan 2010 21:36:44 -0800

Hi,

This solution worked out very well:
 byte[] tmp = new byte[bufferSize];
    int bytesRead;
    try
    {
      if((bytesRead = chunkedIns.read(tmp)) != -1)
      {
        return new String(tmp, 0, bytesRead);
      }
      else
      {
        finish();
        return null;
      }
    }
    catch(IOException e)
    {
      HTTPClientException e2 = new HTTPClientException(e.getMessage());
      e2.setStackTrace(e.getStackTrace());
      throw e2;
    }



If it's not too much of a trouble would anybody please explain to me why is
it possible that the buffer maybe not be 100% full when I read it? I think
it's all depends on how the implementation was done (in this case by Sun),
and if Sun decided to implement buffering this way I don't understand the
logic behind it.


Thank you very much Oleg, Ken and Seb-2-2 for your earlier inputs!



sebb-2-2 wrote:
> 
> On 29/01/2010, amoldavsky <[email protected]> wrote:
>>
>>  Hi Oleg,
>>
>>  Let me rephrase the question in better terms:
>>  If the server document is Y and buffer size is X, let's even assume that
>> Y =
>>  kX where X < Y, is it possible that any buffer 0 < x < (k-1) will not be
>>  fully filled?
> 
> Remember that HTTP packets may be broken up in transit.
> 
> However, even without that, it's never safe to assume that a buffer is
> filled.
> 
> That's what the return value from read(buffer) is for - it tells you
> how many bytes are available.
> 
>>  Thanks!
>>  -Assaf
>>
>>
>>
>>  Ken Krugler wrote:
>>  >
>>  >
>>  > On Jan 28, 2010, at 10:09pm, amoldavsky wrote:
>>  >
>>  >>
>>  >> Hi Oleg,
>>  >> Thank you for the quick reply.
>>  >>
>>  >> So if there is a possibility that not the whole buffer is filled how
>>  >> can I
>>  >> insure or force HttpClient to fill the whole buffer? Should I maybe
>>  >> avoid
>>  >> Stream Readers all together?
>>  >
>>  > If bufferSize is X, and the server document you're fetching has Y
>>  > bytes, then what do you mean by "force HttpClient to fill the whole
>>  > buffer"?
>>  >
>>  > At a minimum, you'd want
>>  >
>>  > int bytesRead = chunkedIns.read(tmp);
>>  > if (bytesRead != -1) {
>>  >     return new String(tmp, 0, bytesRead);
>>  > }
>>  >
>>  > But that also uses the platform default encoding for the character
>>  > set, which often won't be correct.
>>  >
>>  > -- Ken
>>  >
>>  >>
>>  >> olegk wrote:
>>  >>>
>>  >>> On Wed, 2010-01-27 at 20:24 -0800, amoldavsky wrote:
>>  >>>> Hi
>>  >>>>
>>  >>>> I have coded a simple file downloader using HttpClient 4.0.
>>  >>>> It works fine but there is something wrong with the String
>>  >>>> encoding or
>>  >>>> the
>>  >>>> buffer stream. The problem is that there are long sequences of
>>  >>>> "NULL"
>>  >>>> (ANSI
>>  >>>> code 00) through out the final file, like this:
>>  >>>> http://old.nabble.com/file/p27350930/httpclient_error01.jpg
>>  >>>> http://old.nabble.com/file/p27350930/httpclient_error02.jpg
>>  >>>>
>>  >>>> Here is the main code:
>>  >>>>
>>  >>>> public String getChunk(String url, int bufferSize) throws
>>  >>>> HTTPClientException
>>  >>>>  {
>>  >>>>    if(!chunkedStarted)
>>  >>>>    {
>>  >>>>      chunkedIns = getInputStream(url);
>>  >>>>      chunkedStarted = true;
>>  >>>>    }
>>  >>>>
>>  >>>>    byte[] tmp = new byte[bufferSize];
>>  >>>>    try
>>  >>>>    {
>>  >>>>      if(chunkedIns.read(tmp) != -1)
>>  >>>>      {
>>  >>>
>>  >>> What makes you think that the entire buffer will be filled with
>> data?
>>  >>>
>>  >>> Oleg
>>  >>>
>>  >>>
>>  >>>>        return new String(tmp);
>>  >>>>      }
>>  >>>>      else
>>  >>>>      {
>>  >>>>        finish();
>>  >>>>        return null;
>>  >>>>      }
>>  >>>>    }
>>  >>>>    catch(IOException e)
>>  >>>>    {
>>  >>>>      HTTPClientException e2 = new
>>  >>>> HTTPClientException(e.getMessage());
>>  >>>>      e2.setStackTrace(e.getStackTrace());
>>  >>>>      throw e2;
>>  >>>>    }
>>  >>>>  }
>>  >>>>
>>  >>>>  public void finish()
>>  >>>>  {
>>  >>>>    // do some cleaning
>>  >>>>  }
>>  >>>>
>>  >>>>   private InputStream getInputStream(String url) throws
>>  >>>> HTTPClientException
>>  >>>>  {
>>  >>>>    InputStream instream = null;
>>  >>>>
>>  >>>>    httpClient = new DefaultHttpClient();
>>  >>>>    httpClient.getParams().setParameter("http.useragent",
>>  >>>> AGENT_NAME);
>>  >>>>
>>  >>>>    HttpGet httpGet = new HttpGet(url);
>>  >>>>    HttpResponse response = null;
>>  >>>>
>>  >>>>    try
>>  >>>>    {
>>  >>>>      response = httpClient.execute(httpGet);
>>  >>>>      HttpEntity entity = response.getEntity();
>>  >>>>
>>  >>>>      if(entity != null)
>>  >>>>      {
>>  >>>>        instream = entity.getContent();
>>  >>>>      }
>>  >>>>    }
>>  >>>>    catch(ClientProtocolException e)
>>  >>>>    {
>>  >>>>      HTTPClientException e2 = new
>>  >>>> HTTPClientException(e.getMessage());
>>  >>>>      e2.setStackTrace(e.getStackTrace());
>>  >>>>      throw e2;
>>  >>>>    }
>>  >>>>    catch(IOException e)
>>  >>>>    {
>>  >>>>      HTTPClientException e2 = new
>>  >>>> HTTPClientException(e.getMessage());
>>  >>>>      e2.setStackTrace(e.getStackTrace());
>>  >>>>      throw e2;
>>  >>>>    }
>>  >>>>
>>  >>>>    return instream;
>>  >>>>  }
>>  >>>>
>>  >>>> getChuck and getInputStream can basically be one method but I just
>>  >>>> have
>>  >>>> the
>>  >>>> need to split them for internal conveniece, that does not change
>> the
>>  >>>> funtionality as a whole.
>>  >>>>
>>  >>>> It seems like either the conversion from bytes to string is a
>>  >>>> problem:
>>  >>>> return new String(tmp);
>>  >>>>
>>  >>>> or that the buffer is not getting filled to the end. The latter
>>  >>>> could not
>>  >>>> be
>>  >>>> possible because the files are ~30MB each and the buffer size is
>>  >>>> 2Kb.
>>  >>>>
>>  >>>> I have attached the file, it's a CSV (shortened to ~6KB), note
>>  >>>> that long
>>  >>>> white space between some of the URLs, if you just remove it, the
>> URL
>>  >>>> makes
>>  >>>> sense.
>>  >>>> http://old.nabble.com/file/p27350930/datafeed.csv datafeed.csv
>>  >>>>
>>  >>>> Where can this white space come (null) from??
>>  >>>>
>>  >>>> thank!
>>  >>>
>>  >>>
>>  >>>
>>  >>>
>> ---------------------------------------------------------------------
>>  >>> To unsubscribe, e-mail: [email protected]
>>  >>> For additional commands, e-mail: [email protected]
>>  >>>
>>  >>>
>>  >>>
>>  >>
>>  >> --
>>  >> View this message in context:
>>  >>
>> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27366928.html
>>  >> Sent from the HttpClient-User mailing list archive at Nabble.com.
>>  >>
>>  >>
>>  >> ---------------------------------------------------------------------
>>  >> To unsubscribe, e-mail: [email protected]
>>  >> For additional commands, e-mail: [email protected]
>>  >>
>>  >
>>  > --------------------------------------------
>>  > Ken Krugler
>>  > +1 530-210-6378
>>  > http://bixolabs.com
>>  > e l a s t i c   w e b   m i n i n g
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >
>>
>>  --
>>
>> View this message in context:
>> http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27377093.html
>>
>> Sent from the HttpClient-User mailing list archive at Nabble.com.
>>
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: [email protected]
>>  For additional commands, e-mail: [email protected]
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/HttpClient-4.0-encoding-madness-tp27350930p27381546.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: HttpClient 4.0 encoding madness

Reply via email to