Manu - This is the relevant code I was referencing before: https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/webhdfs_hook.py#L54-L71
So multiple connections for a given conn_id is already built into some hooks, but we need a way to set this from CLI. I'll be creating a JIRA shortly and pushing an update to the cli for this On Thu, Aug 30, 2018 at 2:03 AM Manu Zhang <owenzhang1...@gmail.com> wrote: > Thanks Xiaodong, that works like a charm. > > Manu > > On Thu, Aug 30, 2018 at 11:34 AM Deng Xiaodong <xd.den...@gmail.com> > wrote: > > > Hi Manu, > > > > You can set up multiple connections with the same conn_id and different > > host, rather than setting in one single connection. > > > > > > XD > > > > On Thu, Aug 30, 2018 at 11:17 Manu Zhang <owenzhang1...@gmail.com> > wrote: > > > > > Hi Ben, > > > > > > How do you set multiple connections through Web UI (from Connections > item > > > of Admin pull-down list) ? I'm tried setting a comma-separated list to > a > > > conn_id but that doesn't work. > > > > > > Thanks, > > > Manu > > > > > > > > > On Wed, Aug 29, 2018 at 11:31 PM Ben Laird <br.la...@gmail.com> wrote: > > > > > > > Hi Manu, > > > > > > > > We have the same use case as you, a primary and backup namenode. If I > > > > understand your issue correctly, the WebHDFSSensor code checks an > > > iterable > > > > of Airflow connections to the namenode to find one that is active. > > > > > > > > However, my issue (which I've emailed this list about) was that you > > > cannot > > > > set multiple connections with the same name (e.g. webhdfs_default) > > > through > > > > the CLI, only in the Web interface. I'm planning on submitting a PR > > soon > > > to > > > > remedy this. > > > > > > > > Ben > > > > > > > > On Wed, Aug 29, 2018 at 2:57 AM Driesprong, Fokko > <fo...@driesprong.frl > > > > > > > wrote: > > > > > > > > > Hi Manu, > > > > > > > > > > Thanks for raising this question. There is a PR for moving > > > > > <https://github.com/apache/incubator-airflow/pull/3560> to hdfs3. > > > There > > > > is > > > > > code in the existing codebase, which support HA > > > > > < > > > > > > > > > > > > > > > https://github.com/apache/incubator-airflow/blob/53b89b98371c7bb993b242c341d3941e9ce09f9a/airflow/hooks/hdfs_hook.py#L92-L96 > > > > > >, > > > > > but this might not be for the sensor. > > > > > > > > > > Personally I'm not familiar with pyarrow.hdfs, so I'm not the one > to > > > > judge > > > > > how mature it is. We need to replace Snakebite for sure since it is > > > only > > > > > compatible with Python 2.7. > > > > > > > > > > Cheers, Fokko > > > > > > > > > > > > > > > Op wo 29 aug. 2018 om 04:29 schreef Manu Zhang < > > > owenzhang1...@gmail.com > > > > >: > > > > > > > > > > > Hi all, > > > > > > > > > > > > We've been using WebHdfsSensor happily to sensor the state of > > > upstream > > > > > > tasks outputting to HDFS except when there is a namenode switch. > > I've > > > > > > opened https://issues.apache.org/jira/browse/AIRFLOW-2901 to > > discuss > > > > the > > > > > > HDFS HA support. > > > > > > > > > > > > There are two solutions that I can see, > > > > > > > > > > > > 1. use pyarrow.hdfs which has HA support > > > > > > 2. allow user to configure a list of namenodes > > > > > > > > > > > > WDYT ? > > > > > > > > > > > > Thanks, > > > > > > Manu Zhang > > > > > > > > > > > > > > > > > > > > >