[ 
https://issues.apache.org/jira/browse/HAWQ-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Forson updated HAWQ-1210:
---------------------------------
    Description: 
Hi,

I've been using libhdfs3 in a single-threaded environment for several months 
now, without any problems. However, as soon as I tried using the library 
concurrently from multiple threads: hello, segfaults.

Although the source of these segfaults is annoyingly subtle, I've managed to 
isolate it to a relatively small block of my code that does nothing interesting 
aside from using libhdfs3 to download a single hdfs file.

To be clear: I assume that the mistake here is mine -- that is, that I am using 
your library incorrectly. However, I have been unable to find any documentation 
as to how the libhdfs3 API _should_ be used in a multi-threaded environment. I 
initially interpreted this to mean, "go to town, it's all more or less 
thread-safe", but I am now questioning that interpretation.

So, I have a question, and a request.

Question: Are there any known, non-obvious concurrency gotchas regarding the 
usage of libhdfs3 (or whatever it's currently called)?

Request: Could you please add some documentation, to the README and/or hdfs.h, 
regarding usage in a concurrent environment? (ideally, such notes would 
annotate individual components of the API in hdfs.h, but if the answer to my 
question above is, "No", then this could perhaps be a single sentence in the 
README which affirmatively states that the library is generally safe for 
concurrent usage without additional/explicit synchronization -- anything would 
be better than nothing :))

  was:
Hi,

I've been using libhdfs3 in a single-threaded environment for several months 
now, without any problems. However, as soon as I tried using the library 
concurrently from multiple threads: hello, segfaults.

Although the source of these segfaults is annoyingly subtle, I've managed to 
isolate it to a relatively small block of my code that does nothing interesting 
aside from using libhdfs3 to download a single hdfs file.

To be clear: I assume that the mistake here is mine -- that is, that I am using 
your library incorrectly. However, I have been unable to find any documentation 
as to how the libhdfs3 API _should_ be used in a multi-threaded environment. I 
initially interpreted this to mean, "go to town, it's all more or less 
threadsafe", but I am now questioning that interpretation.

So, I have a question, a request.

Question: Are there any known, non-obvious concurrency gotchas regarding the 
usage of libhdfs3 (or whatever it's currently called)?

Request: Could you please add some documentation, to the README and/or hdfs.h, 
regarding usage in a concurrent environment? (ideally, such notes would 
annotate individual components of the API in hdfs.h, but if the answer to my 
question above is, "No", then this could perhaps be a single sentence in the 
README which affirmatively states that the library is generally safe for 
concurrent usage without additional/explicit synchronization -- anything would 
be better than nothing :))


> Documentation regarding usage of libhdfs3 in concurrent environment
> -------------------------------------------------------------------
>
>                 Key: HAWQ-1210
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1210
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: libhdfs
>            Reporter: William Forson
>            Assignee: Lei Chang
>
> Hi,
> I've been using libhdfs3 in a single-threaded environment for several months 
> now, without any problems. However, as soon as I tried using the library 
> concurrently from multiple threads: hello, segfaults.
> Although the source of these segfaults is annoyingly subtle, I've managed to 
> isolate it to a relatively small block of my code that does nothing 
> interesting aside from using libhdfs3 to download a single hdfs file.
> To be clear: I assume that the mistake here is mine -- that is, that I am 
> using your library incorrectly. However, I have been unable to find any 
> documentation as to how the libhdfs3 API _should_ be used in a multi-threaded 
> environment. I initially interpreted this to mean, "go to town, it's all more 
> or less thread-safe", but I am now questioning that interpretation.
> So, I have a question, and a request.
> Question: Are there any known, non-obvious concurrency gotchas regarding the 
> usage of libhdfs3 (or whatever it's currently called)?
> Request: Could you please add some documentation, to the README and/or 
> hdfs.h, regarding usage in a concurrent environment? (ideally, such notes 
> would annotate individual components of the API in hdfs.h, but if the answer 
> to my question above is, "No", then this could perhaps be a single sentence 
> in the README which affirmatively states that the library is generally safe 
> for concurrent usage without additional/explicit synchronization -- anything 
> would be better than nothing :))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to