Hail license changed

2012-12-12 Thread Jeff Garzik

Hail license change was just pushed to the github hail repository.

Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hail status and update (was Re: Question about hail)

2012-11-27 Thread Jeff Garzik


(CC'd hail-devel list)

On 11/26/2012 02:28 AM, Hideki Yamane wrote:

Hello hail upstream authors,

  I'm interested in porting hail (and Aeolus) to Debian, but have some
  questions about it.


Cool!



Q: Is this project is still alive? if so, where is the current main site.
Could you tell me the status, please?


The main site for source code is currently github.

https://github.com/jgarzik/hail
https://github.com/jgarzik/tabled
https://github.com/jgarzik/itd
https://github.com/jgarzik/nfs4d

The Hail home page is https://hail.wiki.kernel.org/ but that clearly has 
some stale links on it (see below).


The status is definitely long pause at the moment, as it got 
de-prioritized by my employer, but I still have a goal of completing the 
Paxos implementation in CLD, making it truly distributed.


Patches are still accepted, and I want it to keep working on current 
platforms.




  - its upstream site hail.wiki.kernel.org points out some materials
but those are empty at all.
 slide  audio: http://www.kernel.org/pub/media/talks/hail/
 git repo: http://git.kernel.org/?p=daemon/distsrv/itd.git
 source: http://www.kernel.org/pub/software/network/distsrv/


Hail was collateral damage in a kernel.org hack.  No data was lost or 
compromised, but it took kernel.org months to recover even basic account 
services and git access.  wikis took months longer after that.  I'm 
still waiting to see if anybody has an archive of old tarballs, because 
k.org was my canonical upstream storage location, with zero local ones.


Local git cryptographic integrity -- the canonical root of all the 
source code -- never disappeared, and was always verified and 
not-hacked.  It briefly moved to github, but is now back at kernel.org.


The tarballs can conceivably be recovered by checking out a git tag, 
re-running autogen.sh, and then make dist... but with 
autoconf/automake/libtool upgrades over the years, tarball checksums 
might change using that method.




   and there's no mail in 
http://news.gmane.org/gmane.comp.distributed.hail.devel
   except spam.


Pete complained about this too.  Not sure what to do about it -- the 
mail server is vger.kernel.org, same as LKML, with a postmaster who 
maintains the spam filter to similar setup and standards.


I was surprised at Pete's spam report at first, too.  I never see any of 
the spam, only the rare hail-devel message, because of good spam 
filtering.  Of course, the M-L archiving bots seem to keep it all.




Q: Can you add special exemptions for OpenSSL to its license,
and add or later, too?

  - Its license is GPL-2, right? and it links to OpenSSL.
However, openssl license is NOT compatible with GPL without special 
exemptions
see 
http://lintian.debian.org/tags/possible-gpl-code-linked-with-openssl.html
and http://www.openssl.org/support/faq.html#LEGAL2

So, we cannot distribute hail binary without its special exemptions.

  - GPL-2 and GPL-3 is NOT compatible.
However, Image Warehouse (part of Aeolus) is licensed under GPL-3, and
it seems to link to hail library (hstor.h).

We should change
 a) hail to GPL-2 or later or GPL-3 (or later)
 b) iwhd to GPL-2
to avoid license incompatibility.


History:  hail was GPL-2 only, following the lead of the kernel.  But it 
sounds like this is impractical given the few existing users, so I am in 
favor of relicensing to GPL-2 or later.


I disagree with the interpretation vis a vis openssl and several others 
do too.  However, if this is an impediment to use, I would be happy to 
accept a pull request adding the openssl exemption language.


Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Project Hail wikis alive again!

2012-05-02 Thread Jeff Garzik
kernel.org fixed their wiki system, which means that all the k.org wikis 
are once again read-write!  This includes Project Hail's home page,


https://hail.wiki.kernel.org/

I hope to have the git repos moved back from https://github.com/jgarzik/ 
to kernel.org soon also.


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to log into Hail Wiki

2012-01-26 Thread Jeff Garzik

On 01/25/2012 08:40 PM, Pete Zaitcev wrote:

Jeff, looks like the wiki rots. The login points to this URL
  https://hail.wiki.kernel.org/articles/u/s/e/Special%7EUserLogin_94cd.html
It returns 404. HALP?


Yes -- all kernel.org wikis are _still_ read-only, even this many months 
after the kernel.org breakin.  ata.wiki.kernel.org has similar behavior.


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch hail 1/1] Plug leak in hstor_parse_key

2011-10-18 Thread Jeff Garzik

On 10/14/2011 01:34 PM, Pete Zaitcev wrote:

Signed-off-by: Pete Zaitcevzait...@kotori.zaitcev.us

---
  lib/hstor.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/hstor.c b/lib/hstor.c
index cb9c4da..5ce9b76 100644
--- a/lib/hstor.c
+++ b/lib/hstor.c
@@ -761,7 +761,7 @@ void hstor_free_keylist(struct hstor_keylist *keylist)
  static void hstor_parse_key(xmlDocPtr doc, xmlNode *node,
  struct hstor_keylist *keylist)
  {
-   struct hstor_object *obj = calloc(1, sizeof(*obj));
+   struct hstor_object *obj;
xmlChar *xs;


good catch... applied



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hail git tree location

2011-10-04 Thread Jeff Garzik

On 10/04/2011 12:12 PM, Pete Zaitcev wrote:

Are we going to have a git tree somewhere? It looks like our old one
was purged from git.kernel.org.


Sorry, I should have posted.  It was migrated along with the kernel.org 
trees to


https://github.com/jgarzik/{hail,tabled,itd,nfs4d}

though kernel.org ones should be coming back.

Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FYI, rpc/ is gone from Fedora 15

2011-05-05 Thread Jeff Garzik

On 05/05/2011 10:14 AM, Jim Meyering wrote:

FYI, /usr/include/rpc/ no longer exists, as of F15's glibc-headers-2.13.90-10,
so hail's lib/cld_msg_rpc.h will have to do something about this #include:

 $ grep rpc.h lib/cld_msg_rpc.h
 #includerpc/rpc.h


hm.  Surely they did not delete sunrpc from glibc?

That would be disappointing.

Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[hail] CLD conversion to TCP lands in git

2011-03-24 Thread Jeff Garzik


I just pushed the CLD protocol change (UDP - TCP) to hail.git[1].  See 
the original post[2] for more details.  It seems pretty solid from my 
beating on it, but it's still raw code.


The focus will be on hammering out the kinks in this switch over the 
next 7-10 days, so expect some breakage and churn during that time.


In particular, while tabled /should/ work, as the API has only seen 
minor changes, it will need a good stress test before I regain 
confidence in it.  There was also an unrelated-to-CLD API change in 
libhail that requires a small tabled tweak[3], that will be attended-to 
this week.


Jeff


[1] 
http://git.kernel.org/?p=daemon/distsrv/hail.git;a=commit;h=3bdeaab68e1c2776a3488ac03f49f7b4bc2659c8

[2] http://marc.info/?l=hail-develm=129379489716486w=2
[3] 
http://git.kernel.org/?p=daemon/distsrv/hail.git;a=commit;h=59becbb9e329cdc20e4894f331fcb8dfc104c35a



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] CLD: switch network proto from UDP to TCP

2011-01-03 Thread Jeff Garzik

On 01/02/2011 06:32 PM, Pete Zaitcev wrote:

On Fri, 31 Dec 2010 05:57:28 -0500
Jeff Garzikj...@garzik.org  wrote:


+   struct cldc_tcp *tcp = private;
+   ssize_t rc;
+   struct ubbp_header ubbp;
+
+   memcpy(ubbp.magic, CLD1, 4);
+   ubbp.op_size = (buflen  8) | 1;
+#ifdef WORDS_BIGENDIAN
+   swab32(ubbp.op_size);
+#endif
+
+   rc = write(tcp-fd,ubbp, sizeof(ubbp));


Why not this:

unsigned int n;

n = (buflen  8) | 1;
ubbp.op_size = GUINT32_TO_LE(n);


Yep.

I used the #ifdef on the read(2) side, where I did not want to create an 
additional var...  then I copied that onto the write(2) side, where it 
is less efficient as you point out.


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash with db5

2011-01-02 Thread Jeff Garzik

On 01/02/2011 08:20 PM, Pete Zaitcev wrote:

Looks like Rawhide throws this if libdb-devel is in use:

make  check-TESTS
make[3]: Entering directory `/q/zaitcev/hail/hail-tip/test/cld'
PASS: prep-db
DB_ENV-lsn_reset: method not permitted before handle's open method
DB_ENV-dbremove: method not permitted before handle's open method
cld[11548]: SIGSEGV
PASS: start-daemon
port file not found.
FAIL: pid-exists

libdb-5.1.19-2.fc15.x86_64


Are you compiling with db4 headers, but linking with db5?
Or vice versa?

This is a problem I ran into, with F14.  hail's configure script 
searches for the first libdb, which will always be libdb5 on = F14, 
because libdb5 is always installed due to dependencies.  But... you can 
either have db4-devel or libdb-devel installed for the devel pkg.


If you have db4-devel + libdb5... boom.

Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] CLD: convert back to libevent

2010-12-31 Thread Jeff Garzik

Switch CLD from hand-rolled server poll code, to libevent.  Follows
similar techniques and rationale as chunkd commit
c1aed7464f237e5a6309351bf003162c77d69e27.  This reverts ancient commit
90b3b5edcf5aa00577f4395fdbb490ed7e9be824.

Signed-off-by: Jeff Garzik jgar...@redhat.com
---
 cld/Makefile.am |3 -
 cld/cld.h   |   22 +++
 cld/server.c|  161 
 cld/session.c   |   69 
 4 files changed, 118 insertions(+), 137 deletions(-)

diff --git a/cld/Makefile.am b/cld/Makefile.am
index 9a13ce0..30eea0b 100644
--- a/cld/Makefile.am
+++ b/cld/Makefile.am
@@ -12,7 +12,8 @@ cld_SOURCES   = cldb.h cld.h \
  cldb.c msg.c server.c session.c util.c
 cld_LDADD  = \
  ../lib/libhail.la @GLIB_LIBS@ @CRYPTO_LIBS@ \
- @SSL_LIBS@ @DB4_LIBS@ @XML_LIBS@ @LIBCURL@
+ @SSL_LIBS@ @DB4_LIBS@ @XML_LIBS@ @LIBCURL@ \
+ @EVENT_LIBS@
 
 cldbadm_SOURCES= cldb.h cldbadm.c
 cldbadm_LDADD  = @CRYPTO_LIBS@ @GLIB_LIBS@ @DB4_LIBS@
diff --git a/cld/cld.h b/cld/cld.h
index 4c0099f..17f14b8 100644
--- a/cld/cld.h
+++ b/cld/cld.h
@@ -22,8 +22,9 @@
 
 #include netinet/in.h
 #include sys/time.h
-#include poll.h
+#include event.h
 #include glib.h
+#include elist.h
 #include cldb.h
 #include cld_msg_rpc.h
 #include cld_common.h
@@ -59,13 +60,13 @@ struct session {
 
uint64_tlast_contact;
uint64_tnext_fh;
-   struct cld_timertimer;
+   struct eventtimer;
 
uint64_tnext_seqid_in;
uint64_tnext_seqid_out;
 
GList   *out_q; /* outgoing pkts (to client) */
-   struct cld_timerretry_timer;
+   struct eventretry_timer;
 
charuser[CLD_MAX_USERNAME];
 
@@ -85,10 +86,10 @@ struct server_stats {
unsigned long   garbage;/* num. garbage pkts dropped */
 };
 
-struct server_poll {
+struct server_socket {
int fd;
-   bool(*cb)(int fd, short events, void *userdata);
-   void*userdata;
+   struct eventev;
+   struct list_headsockets_node;
 };
 
 struct server {
@@ -103,14 +104,13 @@ struct server {
 
struct cldb cldb;   /* database info */
 
-   GArray  *polls;
-   GArray  *poll_data;
+   struct event_base   *evbase_main;
 
-   GHashTable  *sessions;
+   struct list_headsockets;
 
-   struct cld_timer_list   timers;
+   GHashTable  *sessions;
 
-   struct cld_timerchkpt_timer;/* db4 checkpoint timer */
+   struct eventchkpt_timer;/* db4 checkpoint timer */
 
struct server_stats stats;  /* global statistics */
 };
diff --git a/cld/server.c b/cld/server.c
index 7a57785..aed501b 100644
--- a/cld/server.c
+++ b/cld/server.c
@@ -559,7 +559,7 @@ static void simple_sendresp(int sock_fd, const struct 
client *cli,
   info-op);
 }
 
-static bool udp_srv_event(int fd, short events, void *userdata)
+static void udp_srv_event(int fd, short events, void *userdata)
 {
struct client cli;
char host[64];
@@ -586,7 +586,7 @@ static bool udp_srv_event(int fd, short events, void 
*userdata)
rrc = recvmsg(fd, hdr, 0);
if (rrc  0) {
syslogerr(UDP recvmsg);
-   return true; /* continue main loop; do NOT terminate server */
+   return;
}
cli.addr_len = hdr.msg_namelen;
 
@@ -601,59 +601,60 @@ static bool udp_srv_event(int fd, short events, void 
*userdata)
 
if (!parse_pkt_header(raw_pkt, rrc, pkt, hdr_len)) {
cld_srv.stats.garbage++;
-   return true;
+   return;
}
 
if (!get_pkt_info(pkt, raw_pkt, rrc, hdr_len, info)) {
xdr_free((xdrproc_t)xdr_cld_pkt_hdr, (char *)pkt);
cld_srv.stats.garbage++;
-   return true;
+   return;
}
 
if (packet_is_dupe(info)) {
/* silently drop dupes */
xdr_free((xdrproc_t)xdr_cld_pkt_hdr, (char *)pkt);
-   return true;
+   return;
}
 
err = validate_pkt_session(info, cli);
if (err) {
simple_sendresp(fd, cli, info, err);
xdr_free((xdrproc_t)xdr_cld_pkt_hdr, (char *)pkt);
-   return true;
+   return;
}
 
err = pkt_chk_sig(raw_pkt, rrc, pkt);
if (err) {
simple_sendresp(fd, cli, info, err);
xdr_free((xdrproc_t)xdr_cld_pkt_hdr, (char *)pkt);
-   return true;
+   return;
}
 
if (!(cld_srv.cldb.is_master

[PATCH 2/3] CLD: switch network proto from UDP to TCP

2010-12-31 Thread Jeff Garzik

Convert CLD network protocol from UDP to TCP.  Server, client lib,
and chunkd's cldu module are all updated.  tabled's cldu module must
be updated also.

The original rationale for UDP use was following Google's lead, based
on the advice in the original Chubby paper, describing TCP's back-off
policies and other behavior during times of high network congestion.

This seems a bit dubious without further third party evidence, and TCP
vastly simplifies our lives.  While the code remains open and modular
enough to support other protocols (hopefully RDMA or SCTP one day),
this upgrade from UDP to TCP promises to make the current codebase
much easier to use, while avoiding the reinvent TCP, by using UDP
problem, which was a rabbit hole threatening CLD.

Signed-off-by: Jeff Garzik jgar...@redhat.com
---
 chunkd/cldu.c|6 
 cld/cld.h|   43 ++
 cld/msg.c|4 
 cld/server.c |  356 ---
 cld/session.c|4 
 configure.ac |1 
 include/Makefile.am  |2 
 include/cld_common.h |4 
 include/cldc.h   |   24 ++-
 include/ncld.h   |4 
 include/ubbp.h   |   52 +++
 lib/Makefile.am  |2 
 lib/cldc-dns.c   |2 
 lib/cldc-tcp.c   |  185 ++
 lib/cldc-udp.c   |  141 
 lib/cldc.c   |   54 +++
 16 files changed, 595 insertions(+), 289 deletions(-)

diff --git a/chunkd/cldu.c b/chunkd/cldu.c
index 026c523..41f94b5 100644
--- a/chunkd/cldu.c
+++ b/chunkd/cldu.c
@@ -165,7 +165,7 @@ static void cldu_sess_event(void *priv, uint32_t what)
 */
if (cs-nsess) {
applog(LOG_ERR, Session failed, sid  SIDFMT,
-  SIDARG(cs-nsess-udp-sess-sid));
+  SIDARG(cs-nsess-tcp-sess-sid));
} else {
applog(LOG_ERR, Session open failed);
}
@@ -177,7 +177,7 @@ static void cldu_sess_event(void *priv, uint32_t what)
} else {
if (cs)
applog(LOG_INFO, cldc event 0x%x sid  SIDFMT,
-  what, SIDARG(cs-nsess-udp-sess-sid));
+  what, SIDARG(cs-nsess-tcp-sess-sid));
else
applog(LOG_INFO, cldc event 0x%x no sid, what);
}
@@ -372,7 +372,7 @@ static int cldu_set_cldc(struct cld_session *cs, int 
newactive)
}
 
applog(LOG_INFO, New CLD session created, sid  SIDFMT,
-  SIDARG(cs-nsess-udp-sess-sid));
+  SIDARG(cs-nsess-tcp-sess-sid));
 
/*
 * First, make sure the base directory exists.
diff --git a/cld/cld.h b/cld/cld.h
index 17f14b8..b1f9bbf 100644
--- a/cld/cld.h
+++ b/cld/cld.h
@@ -30,6 +30,7 @@
 #include cld_common.h
 #include hail_log.h
 #include hail_private.h
+#include ubbp.h
 
 struct client;
 struct session_outpkt;
@@ -43,10 +44,39 @@ enum {
SFL_FOREGROUND  = (1  0), /* run in foreground */
 };
 
+struct atcp_read {
+   void*buf;
+   unsigned intbuf_size;
+   unsigned intbytes_wanted;
+   unsigned intbytes_read;
+
+   void(*cb)(void *, bool);
+   void*cb_data;
+
+   struct list_headnode;
+};
+
+struct atcp_read_state {
+   struct list_headq;
+};
+
 struct client {
+   int fd;
+
+   struct eventev;
+   short   ev_mask;/* EV_READ and/or EV_WRITE */
+
struct sockaddr_in6 addr;   /* inet address */
socklen_t   addr_len;   /* inet address len */
charaddr_host[64];  /* ASCII version of inet addr */
+   charaddr_port[16];  /* ASCII version of inet addr */
+
+   struct atcp_read_state  rst;
+
+   struct ubbp_header  ubbp;
+
+   charraw_pkt[CLD_RAW_MSG_SZ];
+   unsigned intraw_size;
 };
 
 struct session {
@@ -124,6 +154,17 @@ struct pkt_info {
size_t  hdr_len;
 };
 
+#define ___constant_swab32(x) ((uint32_t)(   \
+(((uint32_t)(x)  (uint32_t)0x00ffUL)  24) |\
+(((uint32_t)(x)  (uint32_t)0xff00UL)   8) |\
+(((uint32_t)(x)  (uint32_t)0x00ffUL)   8) |\
+(((uint32_t)(x)  (uint32_t)0xff00UL)  24)))
+
+static inline uint32_t swab32(uint32_t v)
+{
+   return ___constant_swab32(v);
+}
+
 /* msg.c */
 extern int inode_lock_rescan(DB_TXN *txn, cldino_t inum);
 extern void msg_get(struct session *sess, const void *v);
@@ -178,7 +219,7 @@ extern int sess_load(GHashTable *ss);
 extern struct server cld_srv;
 extern struct hail_log srv_log;
 extern struct timeval current_time;
-extern int udp_tx(int

Re: [patch tabled 6/8] Add filesystem back-end

2010-12-13 Thread Jeff Garzik

On 11/28/2010 08:41 PM, Pete Zaitcev wrote:

This patch adds the first new back-end and makes some changes to the way
nodes are added, to make the invariants of storage_node more sensible.

The filesystem back-end itself is not intended for production use,
so it makes no attempt to run any asynchronous transfers.

We also add a test. Note that this differs from the preliminary versions
of this patch. We used to add both chunk and fs back-ends, so that tabled
replicates to both. This makes sense as a test of store path, but on
retrieval tabled selects any one of available storage nodes with the
object, randomly. It creates gaps in test coverage in any given run.
Therefore, we test two back-end types sequentially now.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  server/Makefile.am   |2
  server/stor_chunk.c  |   21 -
  server/stor_fs.c |  498 +
  server/storage.c |  157 ++--
  server/storparse.c   |   97 +++
  server/tabled.h  |   31 ++
  test/Makefile.am |3
  test/be_fs-test.conf |5
  test/combo-redux |   74 ++
  test/prep-db |4
  test/start-daemon|1
  test/stop-daemon |9
  12 files changed, 835 insertions(+), 67 deletions(-)

commit bccedeedabbe713e4053afa185314b3f57f3d204
Author: Pete Zaitcevzait...@yahoo.com
Date:   Sun Nov 28 17:58:05 2010 -0700

 Add fs back-end, with a test.

diff --git a/server/Makefile.am b/server/Makefile.am
index 52beec4..71bcb35 100644
--- a/server/Makefile.am
+++ b/server/Makefile.am
@@ -6,7 +6,7 @@ sbin_PROGRAMS   = tabled tdbadm
  tabled_SOURCES= tabled.h  \
  bucket.c cldu.c config.c metarep.c object.c replica.c \
  server.c status.c storage.c storparse.c \
- stor_chunk.c util.c
+ stor_chunk.c stor_fs.c util.c
  tabled_LDADD  = ../lib/libtdb.a   \
  @HAIL_LIBS@ @PCRE_LIBS@ @GLIB_LIBS@ \
  @CRYPTO_LIBS@ @DB4_LIBS@ @EVENT_LIBS@ @SSL_LIBS@
diff --git a/server/stor_chunk.c b/server/stor_chunk.c
index 815adcf..7462a9c 100644
--- a/server/stor_chunk.c
+++ b/server/stor_chunk.c
@@ -31,8 +31,7 @@
  #includenetdb.h
  #include tabled.h

-static const char stor_key_fmt[] = %016llx;
-#define STOR_KEY_SLEN  16
+static const char stor_key_fmt[] = STOR_KEY_FMT;

  static int stor_new_stc(struct storage_node *stn, struct st_client **stcp)
  {
@@ -66,24 +65,6 @@ static int stor_new_stc(struct storage_node *stn, struct 
st_client **stcp)
return 0;
  }

-static void stor_read_event(int fd, short events, void *userdata)
-{
-   struct open_chunk *cep = userdata;
-
-   cep-r_armed = false;/* no EV_PERSIST */
-   if (cep-ocb)
-   (*cep-ocb)(cep);
-}
-
-static void stor_write_event(int fd, short events, void *userdata)
-{
-   struct open_chunk *cep = userdata;
-
-   cep-w_armed = false;/* no EV_PERSIST */
-   if (cep-ocb)
-   (*cep-ocb)(cep);
-}
-
  /*
   * Open *cep using stn, set up chunk session if needed.
   */
diff --git a/server/stor_fs.c b/server/stor_fs.c
new file mode 100644
index 000..b433a67
--- /dev/null
+++ b/server/stor_fs.c
@@ -0,0 +1,498 @@
+
+/*
+ * Copyright 2010 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ */
+
+#define _GNU_SOURCE
+#include tabled-config.h
+
+#includesys/types.h
+#includesys/stat.h
+#includeerrno.h
+#includefcntl.h
+#includesyslog.h
+#includestring.h
+#includeglib.h
+#includeevent.h
+#include tabled.h
+
+static const char stor_key_fmt[] = STOR_KEY_FMT;
+
+static char *fs_obj_pathname(const char *base, uint64_t key)
+{
+   enum { PREFIX_LEN = 3 };
+   char prefix[PREFIX_LEN + 1];
+   char stckey[STOR_KEY_SLEN+1];
+   char *s;
+   int rc;
+
+   /* we know that stckey is going to be longer than PREFIX_LEN */
+   sprintf(stckey, stor_key_fmt, (unsigned long long) key);
+   memcpy(prefix, stckey, PREFIX_LEN);
+   prefix[PREFIX_LEN] = 0;
+
+   rc = asprintf(s, %s/%s/%s, base, prefix, stckey + PREFIX_LEN);
+   if (rc  0)
+   goto err_out;
+
+   return s;
+
+err_out:
+   return NULL;
+}
+
+static char *fs_ctl_pathname(const char *base, const char *file)
+{
+   char *s;
+   int rc;
+
+   rc = asprintf(s, %s/%s, base, file);
+   if 

Re: [patch tabled 8/8] Add Swift back-end

2010-12-13 Thread Jeff Garzik

On 11/28/2010 08:41 PM, Pete Zaitcev wrote:

This patch allows to use tabled with OpenStack Swift object store as if it
were our chunkserver, with some extra tricks. The configuration has to be
entred manually into CLD, just like in case of filesystem back-end.

The code is fairly experimental, so it retains extra messages.

Also, since Swift authorizes by plaintext passwords, support for SSL is
essential, but is currently missing.

There is no build-time test for this, because it would require us to
depend on OpenStack, which is untenable.

Signed-off-by: Pete Zaitcevzait...@redhat.com


applied patches 6-8.  well done, this is a milestone for tabled!


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch hail 1/2] Add subdomain calling format

2010-12-07 Thread Jeff Garzik

On 12/05/2010 10:53 PM, Pete Zaitcev wrote:

Amazon appears to give up on forcing users to migrate and bucket-in-path
format is going to stay. However, they still refuse to list buckets from
other regions on the default endpoint, which leads to annoying indirection
(need to know the region somehow before listing). Easier just use the
subdomain format in one invocation.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  include/hstor.h |6 +
  lib/hstor.c |  178 +-
  2 files changed, 106 insertions(+), 78 deletions(-)


applied 1-2


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch tabled 1/8] Shuffle fields of storage nodes

2010-12-07 Thread Jeff Garzik

On 11/28/2010 08:39 PM, Pete Zaitcev wrote:

This helps copy-paste safer later, mostly.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  server/object.c  |2 -
  server/storage.c |   79 ++---
  server/tabled.h  |   12 +++---
  3 files changed, 53 insertions(+), 40 deletions(-)


applied 1-5

Gonna give the file backend a cursory test, and swift backend a 
slightly-more-than-cursory test, then merge those.


Well done!  Pluggable storage backends make tabled more interesting.


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AC_CONFIG_MACRO_DIR([m4])

2010-12-06 Thread Jeff Garzik

On 12/05/2010 04:56 PM, Pete Zaitcev wrote:

Autoconf printed a warning when reconfiguting Hail, so I gave up and
added this:

[...]

Now I have a directory m4/ with symlinks... This does not seem to be
helping any portability, unless I miss where the promised macro are
being saved locally. What was this about, do you happen to know?


I presume you refer to this:


[jgar...@bd hail]$ ./autogen.sh  CFLAGS=-O2 -Wall -Wshadow -g -march=native 
./configure --disable-shared
libtoolize: putting auxiliary files in `.'.
libtoolize: linking file `./ltmain.sh'
libtoolize: You should add the contents of the following files to `aclocal.m4':
libtoolize:   `/usr/share/aclocal/libtool.m4'
libtoolize:   `/usr/share/aclocal/ltoptions.m4'
libtoolize:   `/usr/share/aclocal/ltversion.m4'
libtoolize:   `/usr/share/aclocal/lt~obsolete.m4'
libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac and
libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree.
libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
libtoolize: putting auxiliary files in `.'.
libtoolize: linking file `./ltmain.sh'
libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac and
libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree.
libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.


Think about what this implies:

Keeping the correct libtool macros in-tree implies adding a pointless 
maintenance burden.  The distro always gives us correct, up-to-date 
files.  Why would hail want to potentially lag upstream's version of 
these macros, forcing us to manually track macros that are currently 
updated automatically for each ./autogen.sh invocation?


It is this same silly logic that leads programmers to ship in-tree 
copies of (for example) zlib.


Therefore, the requirement to rebuild hail's configure script is to have 
a recent distro.


Users of tarballs never see this, so this is only an issue for those on 
oddball or ancient OS's, who are building release tarballs, or working 
directly out the git repo.


And if someone is doing that, they have a lot more headaches than just 
outdated libtool to contend with.


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AC_CONFIG_MACRO_DIR([m4])

2010-12-06 Thread Jeff Garzik

On 12/06/2010 12:44 PM, Pete Zaitcev wrote:

On Mon, 06 Dec 2010 12:32:22 -0500
Jeff Garzikj...@garzik.org  wrote:


Keeping the correct libtool macros in-tree implies adding a pointless
maintenance burden.  The distro always gives us correct, up-to-date
files.  Why would hail want to potentially lag upstream's version of
these macros, forcing us to manually track macros that are currently
updated automatically for each ./autogen.sh invocation?


I presumed that the important part is a compatibility between the
syntax used in various .am files and the libtool scriptography that
underpins them. Lagging upstream has no downside in this case
(unlike zlib, where security fixes may exist).


It does not seem optimal to run a current libtool with outdated macro 
files.  In all cases except current one, you're checking in third party, 
maintained, versioned files to hail.git where they will be less-well 
maintained, and generally out-of-date vis a vis current [upstream | Fedora].


Where is the value in performing this additional work, besides silencing 
a warning seen only by git repo users?




Users of tarballs never see this, so this is only an issue for those on
oddball or ancient OS's, who are building release tarballs, or working
directly out the git repo.


Well, if you say so...


Do you have knowledge to the contrary?

Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch hail] remove duplicated stc_readport

2010-10-26 Thread Jeff Garzik

On 10/26/2010 03:47 PM, Pete Zaitcev wrote:

Now that we have a common library for Hail, an opportunity opens to trim
some duplication, such as stc_readport. It even had a comment about it.

Note that we leave cld_readport in the API for a few weeks, while I get
my tabled trees and RPMs in order. Unfortunately we routinely neglect
to set specific version in RPM headers (e.g. no Requires: cld= 0.8.2).

Also, get rid of g_file_get_contents. Talk about pointless: it requires
caller to free memory, and it's not like code is any more compact or
easier to understand.

Signed-off-by: Pete Zaitcevzait...@redhat.com


applied

it would be nice if a follow-up patch moved the hail_readport() 
definition into a more generic, not-CLD-specific header such as 
include/hail.h[1]


Jeff


[1] which doesn't exist yet.  maybe we could rename hail_log.h to 
hail.h, and make hail.h a dumping ground for hail-generic items.



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tabled + atcp

2010-10-23 Thread Jeff Garzik

Just committed this:


commit 57c4be44cdfa6c0cda6cf26d19e8048a945c5a78
Author: Jeff Garzik j...@garzik.org
Date:   Sat Oct 23 14:01:20 2010 -0400

Use libhail's atcp rather than our own async TCP write code.

Should be functionally equivalent, as atcp originated from tabled
code.


Please test, and highlight any behavior differences with vanilla tabled 
v0.5.2.


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH hail] const-correctness tweaks

2010-10-22 Thread Jeff Garzik

On 10/20/2010 04:53 AM, Jim Meyering wrote:

Jeff Garzik wrote:
...

Hi Jeff.

Sorry I didn't notice that the first time.
I built with ./autogen.sh   ./configure   make.
It looks like you recommend -Wall -Wshadow.

The two warnings above are the only ones I see with the patch,
and they're easy to fix.  When storing const pointer params into
a struct like that, I've found that it's best to cast away the const,
which really does reflect the semantics: by using const on the
parameter, I view it as promising not to deref through the pointer
*in that function*.  Since it's usually not reasonable to make
the struct member const (as you saw, it propagates too far
and often ends up being contradictory), the lesser evil of the cast
is preferable here.

If you're still game, the following incremental patch seems to be
enough for me:  Let me know and I'll resubmit the full one.


Well, my primary concern now originates from curl_easy_setopt(3)
documentation:

CURLOPT_WRITEFUNCTION
   Function pointer that  should  match  the  following
  prototype: size_t  function(  void  *ptr,  size_t  size,
  size_t nmemb, void *stream);

hstor's callback is passed directly to libcurl, so we seem to be bound
by outside constraints, no?


I compiled hail (with that patch) on F13 with -Wall -Wshadow
with no warnings.  That curl_easy_setopt documentation seems to be
overly strict, or perhaps out of date?.  When I compare with the
code (curl/typecheck-gcc.h), I see all of the necessary const attributes:


/* evaluates to true if expr is of type curl_write_callback or similar */
#define _curl_is_write_cb(expr)   \
   (_curl_is_read_cb(expr) ||\
__builtin_types_compatible_p(__typeof__(expr), __typeof__(fwrite)) ||  \
__builtin_types_compatible_p(__typeof__(expr), curl_write_callback) || \
_curl_callback_compatible((expr), _curl_write_callback1) ||\
_curl_callback_compatible((expr), _curl_write_callback2) ||\
_curl_callback_compatible((expr), _curl_write_callback3) ||\
_curl_callback_compatible((expr), _curl_write_callback4) ||\
_curl_callback_compatible((expr), _curl_write_callback5) ||\
_curl_callback_compatible((expr), _curl_write_callback6))
typedef size_t (_curl_write_callback1)(const char *, size_t, size_t, void*);
typedef size_t (_curl_write_callback2)(const char *, size_t, size_t,
const void*);
typedef size_t (_curl_write_callback3)(const char *, size_t, size_t, FILE*);
typedef size_t (_curl_write_callback4)(const void *, size_t, size_t, void*);
typedef size_t (_curl_write_callback5)(const void *, size_t, size_t,
const void*);
typedef size_t (_curl_write_callback6)(const void *, size_t, size_t, FILE*);


But even if curl were requiring some suboptimal signature,
it would be nice not to impose that on all projects that use hail.
Are there older curl headers that do require the const-free signature?
If there are and you want to support them, too, let me know -- maybe
I can cook up an autoconf test to make things work there, with minimal
impact.


Nah, I wouldn't worry about the const signature, it's probably just out 
of date documentation.  If users appear running old OS's or OS versions, 
we can tackle autoconf'ing on a piecemeal basis as needs arise.


Committed these patches of yours to hail.git and tabled.git.

Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


hail version 0.7.2 released

2010-10-22 Thread Jeff Garzik

Home: https://hail.wiki.kernel.org/
Git: git://git.kernel.org/pub/scm/daemon/distsrv/hail.git
Download: http://www.kernel.org/pub/software/network/distsrv/hail/

Version 0.7.2 release notes (NEWS):

- cld: read overrun bug fix
- chunkd: add checksum table to disk format, one checksum per 64k of obj 
data

- chunkd, libhail: add new GET_PART operation for partial object retrieval
- chunkd: bug fixes
- chunkd: use libevent (again) for main loop polling
- libhail: add async TCP network writing API, atcp_wr*
- libhail: bug fixes

This release includes incompatible API and on-disk format changes.

Git shortlog attached.

Jeff Garzik (12):
  chunkd: Add checksum table to on-disk format, one sum per 64k of data
  chunkd: checksum data prior to returning via GET
  chunk: Add Get-Partial-Object (GET_PART) operation
  lib/chunksrv.c: add FIXME
  chunkd: internal 32/64-bit type fixes
  test/chunkd/get-part: read and test segment of randomized memory
  libhail: add async TCP network writing API, atcp_wr*
  Use libevent in chunkd, rather than hand-rolled server-poll functionality.
  atcp: extract pre- and post-writev code into separate functions
  Merge branch 'chunkd-libevent'
  chunkd: properly checksum a multi-block range
  Release version 0.7.2.

Jim Meyering (4):
  cld: don't expect inode name to be NUL-terminated (avoid read overrun)
  lib/hstor.c: avoid an unconditional leak in append_qparam
  chunkd: don't leak an FS object iterator
  libhail/hstor: const-correctness tweaks

Pete Zaitcev (3):
  libhail: Fix calling convention of huri_field_escape
  Change cfgfile.txt into a real config file
  pkg: add doc/setup.txt to install



tabled version 0.5.2 released

2010-10-22 Thread Jeff Garzik

Home: https://hail.wiki.kernel.org/
Git: git://git.kernel.org/pub/scm/daemon/distsrv/tabled.git
Download: http://www.kernel.org/pub/software/network/distsrv/tabled/

Version 0.5.2 release notes (NEWS):

- Permit randomly allocated TCP port, for db4 replication master
- Install etc.tabled.conf as a useful example configuration
- minor testsuite additions
- many minor bug fixes

Git shortlog attached.



Re: tabled version 0.5.2 released

2010-10-22 Thread Jeff Garzik

On 10/22/2010 11:39 PM, Jeff Garzik wrote:

Home: https://hail.wiki.kernel.org/
Git: git://git.kernel.org/pub/scm/daemon/distsrv/tabled.git
Download: http://www.kernel.org/pub/software/network/distsrv/tabled/

Version 0.5.2 release notes (NEWS):

- Permit randomly allocated TCP port, for db4 replication master
- Install etc.tabled.conf as a useful example configuration
- minor testsuite additions
- many minor bug fixes

Git shortlog attached.


er, now it's attached.

Jeff Garzik (2):
  test/.gitignore: ignore list-keys test
  Release version 0.5.2.

Jim Meyering (6):
  server/server.c (net_write_port): Don't ignore write error.
  server/server.c: use sizeof(s) rather than equivalent 64
  don't dereference NULL on OOM
  server/status.c: don't deref NULL on failed strdup
  server/bucket.c: don't deref NULL upon failed malloc
  adapt to changed signature of hstor_get's callback function

Pete Zaitcev (8):
  Fix crash when stopping slave
  Clean name vs host
  cleanup a call to closelog()
  Support auto replicaton port
  test/start-daemon: Factor 3 pid-checking if blocks into a loop.
  test/start-daemon: Ignore stale .pid files.
  Add a test for hstor_keys
  Install etc.tabled.conf



Re: hail version 0.7.2 released

2010-10-22 Thread Jeff Garzik

On 10/22/2010 11:22 PM, Jeff Garzik wrote:

Home: https://hail.wiki.kernel.org/
Git: git://git.kernel.org/pub/scm/daemon/distsrv/hail.git
Download: http://www.kernel.org/pub/software/network/distsrv/hail/


It seems that kernel.org mirroring is broken or extremely slow at the 
moment.  The releases should appear as soon as kernel.org mirrors pick 
back up.


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH hail] const-correctness tweaks

2010-10-20 Thread Jeff Garzik

On 10/20/2010 04:00 AM, Jim Meyering wrote:

Jeff Garzik wrote:


On 10/06/2010 08:07 AM, Jim Meyering wrote:


Make write_cb callback's buffer parameter const, like all write-like
functions.
Give a few char * parameters the const attribute.

Signed-off-by: Jim Meyeringmeyer...@redhat.com
---

It looks like most of hail's interfaces are const-correct,
but one stood out because it provokes a warning when I tried to
pass a const-correct write_cb function to hstor_get from iwhd:

  proxy.c:382: warning: passing argument 4 of 'hstor_get' from \
incompatible pointer type
  /usr/include/hstor.h:173: note: expected \
'size_t (*)(void *, size_t, size_t,  void *)' but argument is of type \
'size_t (*)(const void *, size_t,  size_t,  void *)'

In case you feel comfortable fixing this, here's a patch:



Sorry for not getting back to this; I had hoped to solve some
additional problems that cropped up, but didn't have time.  So, to
forestall further delay,

libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -I../include -pthread
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include
-I/usr/include/libxml2 -O2 -Wall -Wshadow -g -MT hutil.lo -MD -MP -MF
.deps/hutil.Tpo -c hutil.c -o hutil.o
hutil.c: In function ‘hreq_hdr_push’:
hutil.c:145: warning: assignment discards qualifiers from pointer
target type
hutil.c:146: warning: assignment discards qualifiers from pointer
target type

warnings appear after this patch.  When solving these warnings with
const' markers, it quickly becomes a bit of a rat's nest.

At a minimum, the write_cb callback signature must match libcurl's,
which does not use 'const'.  I can see this makes sense from libcurl
implementation's perspective, even if it does not really match the
constness one expects from a foo-get function.


Hi Jeff.

Sorry I didn't notice that the first time.
I built with ./autogen.sh  ./configure  make.
It looks like you recommend -Wall -Wshadow.

The two warnings above are the only ones I see with the patch,
and they're easy to fix.  When storing const pointer params into
a struct like that, I've found that it's best to cast away the const,
which really does reflect the semantics: by using const on the
parameter, I view it as promising not to deref through the pointer
*in that function*.  Since it's usually not reasonable to make
the struct member const (as you saw, it propagates too far
and often ends up being contradictory), the lesser evil of the cast
is preferable here.

If you're still game, the following incremental patch seems to be
enough for me:  Let me know and I'll resubmit the full one.


Well, my primary concern now originates from curl_easy_setopt(3) 
documentation:


   CURLOPT_WRITEFUNCTION
  Function pointer that  should  match  the  following
  prototype: size_t  function(  void  *ptr,  size_t  size,
  size_t nmemb, void *stream);

hstor's callback is passed directly to libcurl, so we seem to be bound 
by outside constraints, no?


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH hail] const-correctness tweaks

2010-10-07 Thread Jeff Garzik

On 10/06/2010 08:07 AM, Jim Meyering wrote:


Make write_cb callback's buffer parameter const, like all write-like
functions.
Give a few char * parameters the const attribute.

Signed-off-by: Jim Meyeringmeyer...@redhat.com
---

It looks like most of hail's interfaces are const-correct,
but one stood out because it provokes a warning when I tried to
pass a const-correct write_cb function to hstor_get from iwhd:

 proxy.c:382: warning: passing argument 4 of 'hstor_get' from \
   incompatible pointer type
 /usr/include/hstor.h:173: note: expected \
   'size_t (*)(void *, size_t, size_t,  void *)' but argument is of type \
   'size_t (*)(const void *, size_t,  size_t,  void *)'

In case you feel comfortable fixing this, here's a patch:


  include/hstor.h |4 ++--
  lib/hstor.c |5 +++--
  lib/hutil.c |2 +-
  3 files changed, 6 insertions(+), 5 deletions(-)


This requires updating test/large-object.c in tabled, too.  Would you 
mind sending along that companion patch?



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CLD multi-node status

2010-09-30 Thread Jeff Garzik

On 09/30/2010 04:55 AM, Geert Jansen wrote:

is it correct that CLD is basically single-master right now? I can't
find any trace of the mentioned Paxos implementation in the source.


The current main branch is single-master, correct.  The 'replica' branch 
of hail.git contains the multi-node server -- where paxos implementation 
is imported from db4 replication engine.  No multi-node client lib 
update exists, however.


Look for this to change in the next 2 weeks, though!

Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH hail] chunkd: don't leak an FS object iterator

2010-09-30 Thread Jeff Garzik

On 09/29/2010 11:20 AM, Jim Meyering wrote:


chk_list_objs called fs_list_objs_open without also calling
fs_list_objs_close.

  32,808 bytes in 1 blocks are definitely lost in loss record 413 of 419
 at 0x4A0515D: malloc (vg_replace_malloc.c:195)
 by 0x31BA8A26D0: __alloc_dir (opendir.c:184)
 by 0x405619: fs_list_objs_open (be-fs.c:974)
 by 0x40B202: chk_list_objs (selfcheck.c:41)
 by 0x40B575: chk_dbscan (selfcheck.c:131)
 by 0x40B628: chk_thread_scan (selfcheck.c:147)
 by 0x40B757: chk_thread_command (selfcheck.c:179)
 by 0x40B890: chk_thread_func (selfcheck.c:219)
 by 0x31BC464E83: g_thread_create_proxy (gthread.c:1893)
 by 0x31BB407760: start_thread (pthread_create.c:301)
 by 0x31BA8E151C: clone (clone.S:115)


After seeing a few valgrind references from you, I'm curious... do you 
by chance happen to have a valgrind suppression file for openssl on Fedora?


I've been wanting to run valgrind on chunkd, but each time I attempt it, 
I -- and valgrind -- have been overwhelmed by openssl false positives. 
openssl, deep in its RAND_xxx functions, intentionally does crazy stuff 
like using random, uninitialized stack contents as RNG entropy.  Cute, 
but valgrind quite rightly complains loudly about it.


It's a topic I've been meaning to research, because I currently lack the 
valgrind-fu necessary to have an effective valgrind+chunkd session.


Thanks,

Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH hail] chunkd: don't leak an FS object iterator

2010-09-30 Thread Jeff Garzik

On 09/29/2010 11:20 AM, Jim Meyering wrote:


chk_list_objs called fs_list_objs_open without also calling
fs_list_objs_close.

  32,808 bytes in 1 blocks are definitely lost in loss record 413 of 419
 at 0x4A0515D: malloc (vg_replace_malloc.c:195)
 by 0x31BA8A26D0: __alloc_dir (opendir.c:184)
 by 0x405619: fs_list_objs_open (be-fs.c:974)
 by 0x40B202: chk_list_objs (selfcheck.c:41)
 by 0x40B575: chk_dbscan (selfcheck.c:131)
 by 0x40B628: chk_thread_scan (selfcheck.c:147)
 by 0x40B757: chk_thread_command (selfcheck.c:179)
 by 0x40B890: chk_thread_func (selfcheck.c:219)
 by 0x31BC464E83: g_thread_create_proxy (gthread.c:1893)
 by 0x31BB407760: start_thread (pthread_create.c:301)
 by 0x31BA8E151C: clone (clone.S:115)

Signed-off-by: Jim Meyeringmeyer...@redhat.com


applied


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[chunkd patch] convert to libevent

2010-09-30 Thread Jeff Garzik

For a nice code savings...

 chunkd/Makefile.am |1 
 chunkd/chunkd.h|   28 +
 chunkd/cldu.c  |   64 +--
 chunkd/server.c|  289 +
 chunkd/util.c  |   23 
 configure.ac   |3 
 6 files changed, 116 insertions(+), 292 deletions(-)

diff --git a/chunkd/Makefile.am b/chunkd/Makefile.am
index 78bba72..a45a89b 100644
--- a/chunkd/Makefile.am
+++ b/chunkd/Makefile.am
@@ -10,4 +10,5 @@ chunkd_SOURCES= chunkd.h  \
  objcache.c
 chunkd_LDADD   = \
  ../lib/libhail.la @GLIB_LIBS@ @CRYPTO_LIBS@ \
+ @EVENT_LIBS@ \
  @SSL_LIBS@ @TOKYOCABINET_LIBS@ @XML_LIBS@ @LIBCURL@
diff --git a/chunkd/chunkd.h b/chunkd/chunkd.h
index 937573c..5be155a 100644
--- a/chunkd/chunkd.h
+++ b/chunkd/chunkd.h
@@ -28,7 +28,7 @@
 #include chunk_msg.h
 #include hail_log.h
 #include tchdb.h
-#include cldc.h  /* for cld_timer */
+#include event.h
 #include objcache.h
 
 #ifndef ARRAY_SIZE
@@ -77,6 +77,8 @@ struct client {
charaddr_host[64];  /* ASCII version of inet addr */
charaddr_port[16];  /* ASCII version of port */
int fd; /* socket */
+   struct eventev;
+   short   ev_mask;/* EV_READ and/or EV_WRITE */
 
charuser[CHD_USER_SZ + 1];
 
@@ -172,18 +174,10 @@ struct server_stats {
unsigned long   opt_write;  /* optimistic writes */
 };
 
-struct server_poll {
-   short   events; /* POLL* from poll.h */
-   boolbusy;   /* if true, do not poll us */
-
-   /* callback function, data */
-   bool(*cb)(int fd, short events, void *userdata);
-   void*userdata;
-};
-
 struct server_socket {
int fd;
const struct listen_cfg *cfg;
+   struct eventev;
struct list_headsockets_node;
 };
 
@@ -207,14 +201,15 @@ struct server {
char*pid_file;  /* PID file */
int pid_fd;
 
+   struct event_base   *evbase_main;
+
struct list_headlisteners;
struct list_headsockets;/* points into listeners */
 
-   GHashTable  *fd_info;
-
GThreadPool *workers;   /* global thread worker pool */
int max_workers;
int worker_pipe[2];
+   struct eventworker_ev;
 
struct list_headwr_trash;
unsigned inttrash_sz;
@@ -311,11 +306,6 @@ extern void syslogerr(const char *prefix);
 extern void strup(char *s);
 extern int write_pid_file(const char *pid_fn);
 extern int fsetflags(const char *prefix, int fd, int or_flags);
-extern void timer_init(struct cld_timer *timer, const char *name,
-  void (*cb)(struct cld_timer *), void *userdata);
-extern void timer_add(struct cld_timer *timer, time_t expires);
-extern void timer_del(struct cld_timer *timer);
-extern time_t timers_run(void);
 extern char *time2str(char *strbuf, time_t time);
 extern void hexstr(const unsigned char *buf, size_t buf_len, char *outstr);
 
@@ -328,7 +318,7 @@ extern bool cli_err(struct client *cli, enum chunk_errcode 
code, bool recycle_ok
 extern int cli_writeq(struct client *cli, const void *buf, unsigned int buflen,
 cli_write_func cb, void *cb_data);
 extern bool cli_wr_sendfile(struct client *, cli_write_func);
-extern bool cli_rd_set_poll(struct client *cli, bool readable);
+extern void cli_rd_set_poll(struct client *cli, bool readable);
 extern void cli_wr_set_poll(struct client *cli, bool writable);
 extern bool cli_cb_free(struct client *cli, struct client_write *wr,
bool done);
@@ -336,7 +326,7 @@ extern bool cli_write_start(struct client *cli);
 extern int cli_req_avail(struct client *cli);
 extern int cli_poll_mod(struct client *cli);
 extern bool worker_pipe_signal(struct worker_info *wi);
-extern bool tcp_cli_event(int fd, short events, void *userdata);
+extern void tcp_cli_event(int fd, short events, void *userdata);
 extern void resp_init_req(struct chunksrv_resp *resp,
   const struct chunksrv_req *req);
 
diff --git a/chunkd/cldu.c b/chunkd/cldu.c
index dd8b67c..026c523 100644
--- a/chunkd/cldu.c
+++ b/chunkd/cldu.c
@@ -21,6 +21,7 @@
 #include hail-config.h
 
 #include sys/types.h
+#include sys/time.h
 #include sys/socket.h
 #include glib.h
 #include syslog.h
@@ -39,21 +40,23 @@ struct cld_host {
 };
 
 struct cld_session {
-   bool forced_hosts;  /* Administrator overrode default CLD */
-   bool is_dead;
-   struct ncld_sess *nsess;/* library state */
+   bool  

Re: Autostart

2010-09-30 Thread Jeff Garzik
On Wed, Sep 29, 2010 at 7:09 PM, Pete Zaitcev zait...@redhat.com wrote:
 An interesting question is what to do when iwhd exits. I decided not to
 kill what was started. So, we have a little self-contained cell of
 tabled, chunkd, S3, based off a certain local directory or other
 namespace anchor. Therefore, when iwhd restarts, it tests if the
 cell is still there, and uses that. It also tests if the service
 started successfuly, using the same method.

 As we see, for each service iwhd starts, it needs to verify that
 it's available (either before spawning it, or after). This would be
 done best by talking to the service. But iwhd only has S3 client,
 and no CLD client, so it cannot talk to cld (or chunkd). I had an idea:
 add an autostart feature to tabled.

 Tabled knows how to talk to both chunkd and cld, so it can verify
 that they are running. It would not be that much code. The downside
 is that it's clearly a special case, encoding of a policy. So I am
 asking how objectionable it would be (including do we want tabled -a
 for tests... they kinda run ok as they are).

It seems like quite a special case feature.  tabled is designed to use
multiple chunkd nodes (and hopefully soon, multiple cld nodes).  So
having tabled start chunkd/cld seems misaligned with the existing
design.

That said, if it was possible to write a script or program that
performed autostart without modifying tabled, it would be a nice
addition to the git repository.  tabled.autostart could be a simple
program that pinged cld/chunkd, started if necessary, then exec'd the
real tabled.

Something modular and separate like that would be great.

 Jeff
--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH hail] lib/hstor.c: avoid an unconditional leak in append_qparam

2010-09-27 Thread Jeff Garzik

On 09/27/2010 04:53 AM, Jim Meyering wrote:



Signed-off-by: Jim Meyeringmeyer...@redhat.com
---
I would have preferred to insert a single line right before the
huri_field_escape call:

 char *v = strdup(val);

[would result in a more compact, single-hunk patch]
but it looks like hail uses the anachronistic (pre-C99)
declare all vars at outer scope style, so I conformed.

  lib/hstor.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/lib/hstor.c b/lib/hstor.c
index 6c67bfa..79e0420 100644
--- a/lib/hstor.c
+++ b/lib/hstor.c
@@ -676,6 +676,7 @@ static GString *append_qparam(GString *str, const char 
*key, const char *val,
   char *arg_char)
  {
char *stmp;
+   char *v;

str = g_string_append(str, arg_char);
arg_char[0] = '';
@@ -683,9 +684,11 @@ static GString *append_qparam(GString *str, const char 
*key, const char *val,
str = g_string_append(str, key);
str = g_string_append(str, =);

-   stmp = huri_field_escape(strdup(val), QUERY_ESCAPE_MASK);
+   v = strdup(val);
+   stmp = huri_field_escape(v, QUERY_ESCAPE_MASK);
str = g_string_append(str, stmp);
free(stmp);
+   free(v);



applied

Yeah, I don't like C++ var decls; I think the code gets too 
disorganized, making it really easy to miss a decl when reviewing.


Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH hail] lib/hstor.c: avoid an unconditional leak in append_qparam

2010-09-27 Thread Jeff Garzik

On 09/27/2010 12:29 PM, Pete Zaitcev wrote:

On Mon, 27 Sep 2010 10:53:06 +0200
Jim Meyeringj...@meyering.net  wrote:


-   stmp = huri_field_escape(strdup(val), QUERY_ESCAPE_MASK);
+   v = strdup(val);
+   stmp = huri_field_escape(v, QUERY_ESCAPE_MASK);
str = g_string_append(str, stmp);
free(stmp);
+   free(v);


I think you may be fooled by the ridiculous calling convention


Doh, my memory and I were fooled, too.


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [hail patch 1/1] Fix calling convention of huri_field_escape

2010-09-27 Thread Jeff Garzik

On 09/27/2010 08:49 PM, Pete Zaitcev wrote:

Premature optimization is the root of all evil.

Use a sensible convention of not screwing with the argument, at the expense
of extra strdup.

Fortunately, all users are confined to Hail itself, even if huri_field_escape
is exported.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  include/hstor.h |2 +-
  lib/hstor.c |   44 +---
  lib/huri.c  |   10 +-
  3 files changed, 35 insertions(+), 21 deletions(-)


applied


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [tabled patch 1/1] Add a test for hstor_keys

2010-09-27 Thread Jeff Garzik

On 09/27/2010 08:52 PM, Pete Zaitcev wrote:

Our current tests do not invoke hstor_keys at all, and so they did not catch
a crash with double free in append_qparam.

Add a very basic test which at least calls hstor_keys to verify that it
does not crash right away. This test does not excercise complex modes
such as S3 paging, but better this than nothing.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  test/Makefile.am |4 +
  test/list-keys.c |  102 +
  2 files changed, 105 insertions(+), 1 deletion(-)


applied..  FYI you forgot to update test/.gitignore.  Fixed.


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH tabled 1/2] server/config.c: don't dereference NULL on OOM

2010-09-24 Thread Jeff Garzik

On 09/24/2010 07:32 AM, Jim Meyering wrote:

You can pull from the oom branch here:
   git://git.infradead.org/users/meyering/tabled.git



Got nearly everything perfect.  Need one more minor yet important 
change.  As described in doc/contributions.txt, every changeset MUST 
have a Signed-off-by line at the end of a changeset's description.


I was able to pull and build just fine, so your git repo setup and push 
appears correct.


Also, in your pull request, please put the branch immediately following 
the repo URL on the same line, for easier cut-n-paste.  Here's how Linus 
requests his pull-requests to look:


---SNIP-
Please pull from 'upstream-linus' branch of
git://git.kernel.org/pub/scm/git/jgarzik/libata-dev.git upstream-linus

to receive the following updates:

 drivers/ata/ahci.c|  193 
+++--

 drivers/ata/libata-acpi.c |   40 +-
 drivers/ata/libata-core.c |3 +
 drivers/ata/libata.h  |2 +
 drivers/ata/pata_ali.c|2 +-
 include/linux/ata.h   |9 ++-
 include/linux/libata.h|   12 +++
 7 files changed, 178 insertions(+), 83 deletions(-)

Dirk Hohndel (1):
  pata_ali: trivial fix of a very frequent spelling mistake

Robert Hancock (1):
  ahci: display all AHCI 1.3 HBA capability flags (v2)

Tejun Heo (5):
  ahci: disable 64bit DMA by default on SB600s
  libata: cosmetic updates
  libata: implement more acpi filtering options
  libata: make gtf_filter per-dev
  ahci: filter FPDMA non-zero offset enable for Aspire 3810T

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index acd1162..4edca6e 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
[COMBINED PATCH FOLLOWS...]

---SNIP-
--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH tabled 1/2] server/config.c: don't dereference NULL on OOM

2010-09-24 Thread Jeff Garzik

On 09/24/2010 01:43 PM, Jim Meyering wrote:

Jeff Garzik wrote:

On 09/24/2010 07:32 AM, Jim Meyering wrote:

You can pull from the oom branch here:
git://git.infradead.org/users/meyering/tabled.git


Got nearly everything perfect.  Need one more minor yet important
change.  As described in doc/contributions.txt, every changeset MUST
have a Signed-off-by line at the end of a changeset's description.

I was able to pull and build just fine, so your git repo setup and
push appears correct.

Also, in your pull request, please put the branch immediately
following the repo URL on the same line, for easier cut-n-paste.
Here's how Linus requests his pull-requests to look:


Ok.  I've added those pesky S.O.B lines with this:

   git filter-branch --msg-filter \
 'cat  printf \nSigned-off-by: Jim Meyeringmeyer...@redhat.com\n' \
 HEAD~4..HEAD

and pushed the result.

Please pull from the 'oom' branch of
git://git.infradead.org/users/meyering/tabled.git


pulled from you  pushed upstream, thanks!


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH tabled] server/server.c (net_write_port): Don't ignore write error.

2010-09-23 Thread Jeff Garzik

On 09/23/2010 03:55 AM, Jim Meyering wrote:

Better safe than sorry...
Unreported write failures can be unpleasant.
I fixed the one below so that a failure indication
can propagate up the call tree.  You might also want to
report the failure to stderr.

I let my editor automatically update the copyright date
and remove trailing spaces.
If you'd rather separate those from the fix,
let me know and I can adjust and resend.


Patch applied, thanks.

The typical preference is to receive whitespace and other cosmetic 
changes in a separate patch, thereby highlighting the functional changes.


But we're not so strict here that I would reject an otherwise useful 
patch...


Also FWIW, we're not very strict about reproducing the GCC-ish 
(GNU-ish?) style of $FILENAME ($FUNCTION): in each changelog -- though 
you're certainly welcome to continue, if that's your preference.


Given that git show $COMMIT will give you filename and per-diff-chunk 
function names, reproducing that in the git changelog entry seems 
somewhat redundant.  A simple, English-language summary of the change is 
fine.  Just a style tip, though, feel free to ignore!  :)


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[tabled patch v2] abstract out TCP-write code

2010-09-23 Thread Jeff Garzik

Changes from v1:
- avoid referencing dead struct client (grep for 'invalidate_cli'),
  by changing FSM callback prototype.
- insert 'void *priv' member into struct atcp_wr_state, and replace
  cb_data1/cb_data2 callback parameters with (struct atcp_wr_state *, void *).
  struct client / struct session, or whatever, may be stored in
  atcp_wr_state::priv.
- minor API polishing and further abstraction

 server/Makefile.am |1 
 server/atcp.c  |  238 +++
 server/atcp.h  |  100 +++
 server/bucket.c|8 -
 server/object.c|   56 +--
 server/server.c|  268 +
 server/status.c|3 
 server/tabled.h|   46 ++---
 8 files changed, 436 insertions(+), 284 deletions(-)

diff --git a/server/Makefile.am b/server/Makefile.am
index 5b53a0a..5e0abd5 100644
--- a/server/Makefile.am
+++ b/server/Makefile.am
@@ -4,6 +4,7 @@ INCLUDES= -I$(top_srcdir)/include @GLIB_CFLAGS@ 
@HAIL_CFLAGS@
 sbin_PROGRAMS  = tabled tdbadm
 
 tabled_SOURCES = tabled.h  \
+ atcp.c atcp.h \
  bucket.c cldu.c config.c metarep.c object.c replica.c \
  server.c status.c storage.c storparse.c util.c
 tabled_LDADD   = ../lib/libtdb.a   \
diff --git a/server/atcp.c b/server/atcp.c
new file mode 100644
index 000..0050a68
--- /dev/null
+++ b/server/atcp.c
@@ -0,0 +1,238 @@
+
+/*
+ * Copyright 2010 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ */
+
+#define _GNU_SOURCE
+#include tabled-config.h
+
+#include string.h
+#include stdlib.h
+#include errno.h
+#include sys/uio.h
+#include atcp.h
+
+bool atcp_cb_free(struct atcp_wr_state *wst, void *cb_data, bool done)
+{
+   free(cb_data);
+   return false;
+}
+
+static void atcp_write_complete(struct atcp_write *tmp)
+{
+   struct atcp_wr_state *wst = tmp-wst;
+
+   list_del(tmp-node);
+   list_add_tail(tmp-node, wst-write_compl_q);
+}
+
+static bool atcp_write_free(struct atcp_write *tmp, bool done)
+{
+   struct atcp_wr_state *wst = tmp-wst;
+   bool rcb = false;
+
+   wst-write_cnt -= tmp-length;
+   list_del_init(tmp-node);
+   if (tmp-cb)
+   rcb = tmp-cb(wst, tmp-cb_data, done);
+   free(tmp);
+
+   return rcb;
+}
+
+bool atcp_write_run_compl(struct atcp_wr_state *wst)
+{
+   struct atcp_write *wr;
+   bool do_loop;
+
+   do_loop = false;
+   while (!list_empty(wst-write_compl_q)) {
+   wr = list_entry(wst-write_compl_q.next,
+   struct atcp_write, node);
+   do_loop |= atcp_write_free(wr, true);
+   }
+   return do_loop;
+}
+
+void atcp_write_free_all(struct atcp_wr_state *wst)
+{
+   struct atcp_write *wr, *tmp;
+
+   atcp_write_run_compl(wst);
+   list_for_each_entry_safe(wr, tmp, wst-write_q, node) {
+   atcp_write_free(wr, false);
+   }
+}
+
+static bool atcp_writable(struct atcp_wr_state *wst)
+{
+   int n_iov;
+   struct atcp_write *tmp;
+   ssize_t rc;
+   struct iovec iov[ATCP_MAX_WR_IOV];
+
+   /* accumulate pending writes into iovec */
+   n_iov = 0;
+   list_for_each_entry(tmp, wst-write_q, node) {
+   if (n_iov == ATCP_MAX_WR_IOV)
+   break;
+   /* bleh, struct iovec should declare iov_base const */
+   iov[n_iov].iov_base = (void *) tmp-buf;
+   iov[n_iov].iov_len = tmp-togo;
+   n_iov++;
+   }
+
+   /* execute non-blocking write */
+do_write:
+   rc = writev(wst-fd, iov, n_iov);
+   if (rc  0) {
+   if (errno == EINTR)
+   goto do_write;
+   if (errno != EAGAIN)
+   goto err_out;
+   return true;
+   }
+
+   /* iterate through write queue, issuing completions based on
+* amount of data written
+*/
+   while (rc  0) {
+   int sz;
+
+   /* get pointer to first record on list */
+   tmp = list_entry(wst-write_q.next, struct atcp_write, node);
+
+   /* mark data consumed by decreasing tmp-len */
+   sz = (tmp-togo  rc) ? tmp-togo : rc;
+   tmp-togo -= sz;
+   tmp-buf += 

Re: [tabled patch] abstract out TCP-write code

2010-09-23 Thread Jeff Garzik

On 09/23/2010 11:28 AM, Jim Meyering wrote:

Every developer should have MALLOC_PERTURB_=N (N in 1..255) set in
his/her environment on glibc-based systems.  Almost all the time.


I heard about it a while ago, even submitted a bugzilla bug to have it 
documented adequately.  But apparently its absent from my .bash_profile. 
 Added.


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [tabled patch] abstract out TCP-write code

2010-09-23 Thread Jeff Garzik

On 09/22/2010 10:37 PM, Pete Zaitcev wrote:

On Wed, 22 Sep 2010 21:26:13 -0400
Jeff Garzikj...@garzik.org  wrote:

It is a common idiom even in GLib that callbacks receive two anonymous
pointers; witness the data type GFunc's 'data' and 'user_data'
arguments:
http://library.gnome.org/devel/glib/stable/glib-Doubly-Linked-Lists.html#GFunc


There's a lot of retarged garbage in Glib, just look at their lists.
If someone smarter wrote Glib, we would not need struct list_head.


I use both list types, because there's a use case for both.  You don't 
always have the luxury of having a struct in which to embed data+next 
pointers.  Allocated strings are an excellent example.


GFunc has two parameters for a reason :)  See for example 
http://library.gnome.org/devel/glib/stable/glib-Doubly-Linked-Lists.html#g-list-foreach


It really is a common idiom, based on a common need, not just my style 
preference.  :)


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [tabled patch] abstract out TCP-write code

2010-09-22 Thread Jeff Garzik

On 09/22/2010 10:37 PM, Pete Zaitcev wrote:

On Wed, 22 Sep 2010 21:26:13 -0400
Jeff Garzikj...@garzik.org  wrote:


  So, we go a longer route and re-hook the list of completions
  to a per-server global instead of a client. The patch is straight-
  forward. The only thing we need to be careful is to make sure
  that no outstanding completions are left in the queue before
  freeing a client struct. This is ensured by force-running completions.



Looking at this change again, I don't see how this avoids
use-after-free.  If completions exist after state change function leads
one to cli_evt_dispose() -  cli_free(), then cli_write_run_compl() still
calls cli_write_free() with the stale 'cli' pointer.


We run completions before freeing in all cases. My patch was correct.


Logically, if completions are run before freeing in all cases, there is 
no need to make write_compl_q global.  That was a red herring, which by 
side effect avoided the bug with the stale 'cli' pointer.


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reconsidering libevent

2010-09-21 Thread Jeff Garzik
On Tue, Sep 21, 2010 at 5:06 PM, Steven Dake sd...@redhat.com wrote:
 libevent version 2 has proper mutual exclusion, but the code needs some
 work.

1.x should work for chunkd at the moment.  I need to resist my own
urge to think too far ahead and overengineer for the future sometimes;
I think this is one of those occasions.  libevent 1.x seems solid for
single-thread usage, and that's how we'll use libevent, even if
multiple chunkd threads are in existence.

 Jeff
--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [hail patch 0/3] chunkd: on-disk checksumming and get-partial operation

2010-09-15 Thread Jeff Garzik

Just pushed this out to hail.git.

--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't expect inode name to be NUL-terminated (avoid read overrun)

2010-09-14 Thread Jeff Garzik

On 09/10/2010 08:55 AM, Jim Meyering wrote:


* server/msg.c (msg_get): Copy only name_len bytes, then NUL-terminate,
rather than using snprintf to copy up to and including nonexistent NUL.
---

valgrind exposed this.  The use of snprintf would have been
correct if the inode name buffer (following the struct raw_inode)
were NUL-terminated, but it is not.


applied -- good catch

out of curiosity, what is your patch base?

We combined cld and chunkd into a single 'hail' pkg, and from the 
pathname, your patch was generated from the older cld pkg.  We'd like to 
find the source and replace cld/chunkd with 'hail'.


F12?  F13?  rawhide?

Thanks,

Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[hail patch 0/3] chunkd: on-disk checksumming and get-partial operation

2010-09-14 Thread Jeff Garzik

This patchset is just about ready to go upstream.  Just need to write a
couple tests (familiar refrain eh?:)).

These changes add a new Get-Partial-Object (GET_PART) chunkd operation.

GET_PART permits partial retrieval of an object, by adding an
(offset,length) pair to the standard Get-Object (GET) operation.
length==0 is special-cased as meaning retrieve until end of object.

The maximum number of bytes that may be requested in a single GET_PART
request is 4 x 64k blocks (256k).  Larger lengths will be truncated
down to the maximum.

Because we currently only store whole-object SHA1 checksums, we are left
without an ability to verify on-disk data is valid, when retrieving a
subset of an object.  Thus, a necessary pre-req of GET_PART is changing
the checksum scheme, which is done as follows:

* objects are defined as runs of 64k logical blocks
* checksums are stored on-disk for each 64k in an object
* Rather than returning the stored SHA1 checksum, which serves
  to verify both on-disk and network integrity, we break this
  into two steps,
* verify per-64k checksums at GET_PART time
* generate on-the-fly SHA1 checksum for GET_PART
  returned data

The chunkd network protocol supports any offset/length, including
not-64k-aligned values.  However, failure to align GET_PART requests on
64k boundaries will result in reduced performance, due to additional
work chunkd must perform [and then throw away], because chunkd now works
in 64k chunks internally.

This is a major protocol milestone, and should immediately enable sane
usage by nfs4d and itd (see wiki if unfamiliar), as well as hopefully
providing useful benefits to tabled as well.

--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[hail patch 1/3] chunkd: Add checksum table to on-disk format

2010-09-14 Thread Jeff Garzik

commit f1de17a6e2b3afdbfbfa581228280b65a4a17e5f
Author: Jeff Garzik j...@garzik.org
Date:   Thu Aug 5 17:47:03 2010 -0400

chunkd: Add checksum table to on-disk format, one sum per 64k of data

Signed-off-by: Jeff Garzik jgar...@redhat.com

 chunkd/be-fs.c |  162 -
 1 file changed, 137 insertions(+), 25 deletions(-)

diff --git a/chunkd/be-fs.c b/chunkd/be-fs.c
index 4b851a7..d714e7c 100644
--- a/chunkd/be-fs.c
+++ b/chunkd/be-fs.c
@@ -53,14 +53,23 @@ struct fs_obj {
int in_fd;
char*in_fn;
off_t   sendfile_ofs;
+
+   size_t  checked_bytes;
+   SHA_CTX checksum;
+   unsigned intcsum_idx;
+   void*csum_tbl;
+   size_t  csum_tbl_sz;
+
+   unsigned intn_blk;
 };
 
 struct be_fs_obj_hdr {
charmagic[4];
uint32_tkey_len;
uint64_tvalue_len;
+   uint32_tn_blk;
 
-   charreserved[16];
+   charreserved[12];
 
unsigned char   hash[CHD_CSUM_SZ];
charowner[128];
@@ -208,6 +217,8 @@ static struct fs_obj *fs_obj_alloc(void)
obj-out_fd = -1;
obj-in_fd = -1;
 
+   SHA1_Init(obj-checksum);
+
return obj;
 }
 
@@ -318,6 +329,17 @@ static bool key_valid(const void *key, size_t key_len)
return true;
 }
 
+static unsigned int fs_blk_count(uint64_t data_len)
+{
+   uint64_t n_blk;
+
+   n_blk = data_len  CHUNK_BLK_ORDER;
+   if (data_len  (CHUNK_BLK_SZ - 1))
+   n_blk++;
+
+   return (unsigned int) n_blk;
+}
+
 struct backend_obj *fs_obj_new(uint32_t table_id,
   const void *key, size_t key_len,
   uint64_t data_len,
@@ -325,6 +347,7 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
 {
struct fs_obj *obj;
char *fn = NULL;
+   size_t csum_bytes;
enum chunk_errcode erc = che_InternalError;
off_t skip_len;
 
@@ -339,6 +362,13 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
return NULL;
}
 
+   obj-n_blk = fs_blk_count(data_len);
+   csum_bytes = obj-n_blk * CHD_CSUM_SZ;
+   obj-csum_tbl = malloc(csum_bytes);
+   if (!obj-csum_tbl)
+   goto err_out;
+   obj-csum_tbl_sz = csum_bytes;
+
/* build local fs pathname */
fn = fs_obj_pathname(table_id, key, key_len);
if (!fn)
@@ -359,7 +389,7 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
obj-out_fn = fn;
 
/* calculate size of front-of-file metadata area */
-   skip_len = sizeof(struct be_fs_obj_hdr) + key_len;
+   skip_len = sizeof(struct be_fs_obj_hdr) + key_len + csum_bytes;
 
/* position file pointer where object data (as in, not metadata)
 * will begin
@@ -397,7 +427,10 @@ struct backend_obj *fs_obj_open(uint32_t table_id, const 
char *user,
struct be_fs_obj_hdr hdr;
ssize_t rrc;
uint64_t value_len, tmp64;
+   size_t csum_bytes;
enum chunk_errcode erc = che_InternalError;
+   struct iovec iov[2];
+   size_t total_rd_len;
 
if (!key_valid(key, key_len)) {
*err_code = che_InvalidKey;
@@ -457,23 +490,45 @@ struct backend_obj *fs_obj_open(uint32_t table_id, const 
char *user,
goto err_out;
 
value_len = GUINT64_FROM_LE(hdr.value_len);
+   obj-n_blk = GUINT32_FROM_LE(hdr.n_blk);
+   csum_bytes = obj-n_blk * CHD_CSUM_SZ;
 
/* verify file size large enough to contain value */
-   tmp64 = value_len + sizeof(hdr) + key_len;
+   tmp64 = value_len + sizeof(hdr) + key_len + csum_bytes;
if (G_UNLIKELY(st.st_size  tmp64)) {
applog(LOG_ERR, obj(%s) size error, too small, obj-in_fn);
goto err_out;
}
 
+   /* verify expected size of checksum table */
+   if (G_UNLIKELY(fs_blk_count(value_len) != obj-n_blk)) {
+   applog(LOG_ERR, obj(%s) unexpected blk count 
+  (%u from val sz, %u from hdr),
+  obj-in_fn, fs_blk_count(value_len), obj-n_blk);
+   goto err_out;
+   }
+
+   obj-csum_tbl = malloc(csum_bytes);
+   if (!obj-csum_tbl)
+   goto err_out;
+   obj-csum_tbl_sz = csum_bytes;
+
obj-bo.key = malloc(key_len);
obj-bo.key_len = key_len;
if (!obj-bo.key)
goto err_out;
 
-   /* read object variable-length header */
-   rrc = read(obj-in_fd, obj-bo.key, key_len);
-   if ((rrc != key_len) || (memcmp(key, obj-bo.key, key_len))) {
-   applog(LOG_ERR, read hdr key obj(%s) failed: %s,
+   /* init additional header segment list */
+   iov[0].iov_base

Re: [tabled patch 4/5] Support auto replicaton port

2010-08-13 Thread Jeff Garzik

On 08/12/2010 03:22 PM, Pete Zaitcev wrote:

Allow random ports for replication master to listen on.

The patch is somewhat larger than expected, because before we had
the MASTER file written right after locking. Now we may have it
written without listening parameters, and the slaves must be
ready to deal with it.

Unlike the auto client port, we do not need to write any accessor
files, because we already report the host and port through CLD.

Listening on random ports has security implications.

Signed-off-by: Pete Zaitcevzait...@redhat.com


applied 1-4 of 5


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [tabled patch 1/3] make a const struct static

2010-08-10 Thread Jeff Garzik

On 08/05/2010 11:40 PM, Pete Zaitcev wrote:

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  server/server.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

commit 93b990f68e5c2c652759a2db8af049d172b8489c
Author: Pete Zaitcevzait...@yahoo.com
Date:   Thu Aug 5 20:33:21 2010 -0600

 Make initialized struct a static const.


applied 1-2


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [hail patch 2/3] fix 32/64 wire interoperability

2010-08-05 Thread Jeff Garzik

On 08/04/2010 07:16 PM, Pete Zaitcev wrote:

Testing found that tabled and chunkd running on CPUs with different
word length cannot talk to each other.

The bug was introduced by commit ea5d20bc22aeed077312c9c1824e84651af17a16.

The fix is to add named padding that takes the place of the invisible
padding, thus making the layout platform-neutral.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  include/chunk_msg.h |1 +
  1 file changed, 1 insertion(+)

diff --git a/include/chunk_msg.h b/include/chunk_msg.h
index a34fc21..4c170e4 100644
--- a/include/chunk_msg.h
+++ b/include/chunk_msg.h
@@ -91,6 +91,7 @@ struct chunksrv_resp {
uint32_tnonce;  /* txn id, copied from request */
uint64_tdata_len;   /* len of addn'l data */
unsigned char   hash[CHD_CSUM_SZ];  /* SHA1 checksum */
+   unsigned char   rsv2[4];/* pad for 64 bits */
  };


good catch.  applied 1-3, and pushed out.

I wonder if we shouldn't switch to attribute(packed) for safety, though.


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [hail patch 1/1] Make host, url, orig_path dynamic

2010-07-29 Thread Jeff Garzik

On 07/29/2010 01:41 PM, Pete Zaitcev wrote:

On Tue, 20 Jul 2010 16:34:19 -0400
Jeff Garzikj...@garzik.org  wrote:


   lib/hstor.c |  147 +++---
   1 file changed, 104 insertions(+), 43 deletions(-)


applied


It's not in the git repo. Check this URL:
http://git.kernel.org/?p=daemon/distsrv/hail.git


Forgot to push, sorry.  It's there now.

Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [hail patch 1/7] Drop old comments about chunkdc

2010-07-29 Thread Jeff Garzik

On 07/29/2010 10:49 PM, Pete Zaitcev wrote:

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  configure.ac |2 --
  1 file changed, 2 deletions(-)

commit 00be6055a3801ef8e84a4c78b43b43b67a76eab9
Author: Pete Zaitcevzait...@yahoo.com
Date:   Thu Jul 29 19:10:05 2010 -0600

 Drop comment for a dead library.


applied 1-7 to tabled repo


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [hail patch 1/1] Make host, url, orig_path dynamic

2010-07-20 Thread Jeff Garzik

On 07/20/2010 04:16 PM, Pete Zaitcev wrote:

Some of my performance tests for tabled hit truncation again:

  [zait...@hitlain tests]$ ./poke5 -v -h niphredil.zaitcev.lan -u auser
   -p apass -b test -o -k testkey-hitlain/73b84a11e6d83c65e45853338d646042
   -f testdir/73b84a11e6d83c65e45853338d646042
  * About to connect() to niphredil.zaitcev.lan port 80 (#0)
  *   Trying fec0::1:219:b9ff:fe58:7ad6... * TCP_NODELAY set
  * connected
  * Connected to niphredil.zaitcev.lan (fec0::1:219:b9ff:fe59:7ad6) port 80 (#0)
PUT /test/testkey-hitlain/73b84a11e6d83c65e45853338d HTTP/1.1
  Accept: */*
  Host: niphredil.zaitcev.lan
  Date: Tue, 20 Jul 2010 01:07:33 +
  Authorization: AWS testuser:RefcbVYgr2m9KTRxOrCfr4zzfPE=
  Content-Length: 214745088
  Expect: 100-continue

  * The requested URL returned error: 403

As you can see, the path in PUT is truncated, and this causes 403
since it's included into a hash.

The patch addresses this issue and a bunch of other fixed-size
strings before we hit that.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  lib/hstor.c |  147 +++---
  1 file changed, 104 insertions(+), 43 deletions(-)


applied

--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] chunkd checksums each block, as it is read from disk

2010-07-18 Thread Jeff Garzik

Note that we are checksumming hot cache data, so SHA1 isn't as
punishing as one might think.


 chunkd/be-fs.c |   51 ++-
 1 file changed, 50 insertions(+), 1 deletion(-)

commit 2211e3b58620093866be4130397cb3b476620725
Author: Jeff Garzik j...@garzik.org
Date:   Sun Jul 18 03:03:35 2010 -0400

[chunkd] checksum data prior to returning via GET

When reading a file off disk, checksum the data after reading from
disk, prior to sending across network to client.  Fail read, if
checksum fails.

This guarantees we will never send corrupted data to a client.

Signed-off-by: Jeff Garzik jgar...@redhat.com

diff --git a/chunkd/be-fs.c b/chunkd/be-fs.c
index 2120991..dce2561 100644
--- a/chunkd/be-fs.c
+++ b/chunkd/be-fs.c
@@ -49,6 +49,10 @@ struct fs_obj {
 
int in_fd;
char*in_fn;
+   off_t   in_pos;
+
+   off_t   tail_pos;
+   size_t  tail_len;
 
size_t  checked_bytes;
SHA_CTX checksum;
@@ -364,6 +368,8 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
if (!obj-csum_tbl)
goto err_out;
obj-csum_tbl_sz = csum_bytes;
+   obj-tail_pos = data_len  ~(CHUNK_BLK_SZ - 1);
+   obj-tail_len = data_len  (CHUNK_BLK_SZ - 1);
 
/* build local fs pathname */
fn = fs_obj_pathname(table_id, key, key_len);
@@ -488,6 +494,8 @@ struct backend_obj *fs_obj_open(uint32_t table_id, const 
char *user,
value_len = GUINT64_FROM_LE(hdr.value_len);
obj-n_blk = GUINT32_FROM_LE(hdr.n_blk);
csum_bytes = obj-n_blk * CHD_CSUM_SZ;
+   obj-tail_pos = value_len  ~(CHUNK_BLK_SZ - 1);
+   obj-tail_len = value_len  (CHUNK_BLK_SZ - 1);
 
/* verify file size large enough to contain value */
tmp64 = value_len + sizeof(hdr) + key_len + csum_bytes;
@@ -571,15 +579,56 @@ void fs_obj_free(struct backend_obj *bo)
free(obj);
 }
 
+static bool can_csum_blk(struct fs_obj *obj, size_t len)
+{
+   if (obj-in_pos  (CHUNK_BLK_SZ - 1))
+   return false;
+
+   if (obj-in_pos == obj-tail_pos  len == obj-tail_len)
+   return true;
+   if (len == CHUNK_BLK_SZ)
+   return true;
+
+   return false;
+}
+
 ssize_t fs_obj_read(struct backend_obj *bo, void *ptr, size_t len)
 {
struct fs_obj *obj = bo-private;
ssize_t rc;
 
rc = read(obj-in_fd, ptr, len);
-   if (rc  0)
+   if (rc  0) {
applog(LOG_ERR, obj read(%s) failed: %s,
   obj-in_fn, strerror(errno));
+   return -errno;
+   }
+
+   if (can_csum_blk(obj, rc)) {
+   unsigned char md[CHD_CSUM_SZ];
+   unsigned int blk_pos;
+   int cmprc;
+
+   SHA1(ptr, rc, md);
+
+   blk_pos = (unsigned int) (obj-in_pos  CHUNK_BLK_ORDER);
+   cmprc = memcmp(md, obj-csum_tbl + (blk_pos * CHD_CSUM_SZ),
+  CHD_CSUM_SZ);
+
+   if (cmprc) {
+   applog(LOG_WARNING, obj(%s) csum failed @ 0x%llx,
+  obj-in_fn,
+  (unsigned long long) obj-in_pos);
+   return -EIO;
+   }
+   } else {
+   applog(LOG_INFO, obj(%s) unaligned read, 0x%x @ 0x%llx,
+  obj-in_fn, len,
+  (unsigned long long) obj-in_pos);
+   
+   }
+
+   obj-in_pos += rc;
 
return rc;
 }
--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2] chunkd: remove sendfile(2) zero-copy support

2010-07-18 Thread Jeff Garzik

On 07/17/2010 11:45 PM, Steven Dake wrote:

On 07/16/2010 10:46 PM, Jeff Garzik wrote:

chunkd: remove sendfile(2) zero-copy support

chunkd will be soon checksumming data in main memory. That removes
the utility of a zero-copy interface which bypasses the on-heap
data requirement.

Signed-off-by: Jeff Garzikjgar...@redhat.com



May be able to use vmsplice with sendfile (if linux is only target
platform). Haven't tried it myself, but the operations look interesting
at achieving zero copy with sockets from memory addresses.


Even though the man pages say only for pipes, this syscall definitely 
works with TCP.  The big question:  is it actually faster than 
read()+write() ?


Years ago, I experimented with using some fancy new Linux-specific 
syscalls in a from-scratch implementation of cp(1).  It turned out that 
read()+write() was faster than other methods.


That was file-file copying.  It's probably worth investigating 
vmsplice() for our file-checksum-TCP case, definitely.


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3 v2] chunkd: Add checksum table to on-disk format, one sum per 64k of data

2010-07-16 Thread Jeff Garzik

 chunkd/be-fs.c  |  145 +---
 chunkd/chunkd.h |3 +
 2 files changed, 131 insertions(+), 17 deletions(-)
commit 394109d5c2fc2d15d91c2d36eecd57594922c1b3
Author: Jeff Garzik j...@garzik.org
Date:   Sat Jul 17 01:05:15 2010 -0400

chunkd: Add checksum table to on-disk format, one sum per 64k of data

Signed-off-by: Jeff Garzik jgar...@redhat.com

diff --git a/chunkd/be-fs.c b/chunkd/be-fs.c
index 0a81134..5955afa 100644
--- a/chunkd/be-fs.c
+++ b/chunkd/be-fs.c
@@ -49,14 +49,23 @@ struct fs_obj {
 
int in_fd;
char*in_fn;
+
+   size_t  checked_bytes;
+   SHA_CTX checksum;
+   unsigned intcsum_idx;
+   void*csum_tbl;
+   size_t  csum_tbl_sz;
+
+   unsigned intn_blk;
 };
 
 struct be_fs_obj_hdr {
charmagic[4];
uint32_tkey_len;
uint64_tvalue_len;
+   uint32_tn_blk;
 
-   charreserved[16];
+   charreserved[12];
 
unsigned char   hash[CHD_CSUM_SZ];
charowner[128];
@@ -204,6 +213,8 @@ static struct fs_obj *fs_obj_alloc(void)
obj-out_fd = -1;
obj-in_fd = -1;
 
+   SHA1_Init(obj-checksum);
+
return obj;
 }
 
@@ -314,6 +325,17 @@ static bool key_valid(const void *key, size_t key_len)
return true;
 }
 
+static unsigned int fs_blk_count(uint64_t data_len)
+{
+   uint64_t n_blk;
+
+   n_blk = data_len  CHUNK_BLK_ORDER;
+   if (data_len  (CHUNK_BLK_SZ - 1))
+   n_blk++;
+
+   return (unsigned int) n_blk;
+}
+
 struct backend_obj *fs_obj_new(uint32_t table_id,
   const void *key, size_t key_len,
   uint64_t data_len,
@@ -321,6 +343,7 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
 {
struct fs_obj *obj;
char *fn = NULL;
+   size_t csum_bytes;
enum chunk_errcode erc = che_InternalError;
off_t skip_len;
 
@@ -335,6 +358,13 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
return NULL;
}
 
+   obj-n_blk = fs_blk_count(data_len);
+   csum_bytes = obj-n_blk * CHD_CSUM_SZ;
+   obj-csum_tbl = malloc(csum_bytes);
+   if (!obj-csum_tbl)
+   goto err_out;
+   obj-csum_tbl_sz = csum_bytes;
+
/* build local fs pathname */
fn = fs_obj_pathname(table_id, key, key_len);
if (!fn)
@@ -355,7 +385,7 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
obj-out_fn = fn;
 
/* calculate size of front-of-file metadata area */
-   skip_len = sizeof(struct be_fs_obj_hdr) + key_len;
+   skip_len = sizeof(struct be_fs_obj_hdr) + key_len + csum_bytes;
 
/* position file pointer where object data (as in, not metadata)
 * will begin
@@ -393,7 +423,10 @@ struct backend_obj *fs_obj_open(uint32_t table_id, const 
char *user,
struct be_fs_obj_hdr hdr;
ssize_t rrc;
uint64_t value_len, tmp64;
+   size_t csum_bytes;
enum chunk_errcode erc = che_InternalError;
+   struct iovec iov[2];
+   size_t total_rd_len;
 
if (!key_valid(key, key_len)) {
*err_code = che_InvalidKey;
@@ -453,23 +486,45 @@ struct backend_obj *fs_obj_open(uint32_t table_id, const 
char *user,
goto err_out;
 
value_len = GUINT64_FROM_LE(hdr.value_len);
+   obj-n_blk = GUINT32_FROM_LE(hdr.n_blk);
+   csum_bytes = obj-n_blk * CHD_CSUM_SZ;
 
/* verify file size large enough to contain value */
-   tmp64 = value_len + sizeof(hdr) + key_len;
+   tmp64 = value_len + sizeof(hdr) + key_len + csum_bytes;
if (G_UNLIKELY(st.st_size  tmp64)) {
applog(LOG_ERR, obj(%s) size error, too small, obj-in_fn);
goto err_out;
}
 
+   /* verify expected size of checksum table */
+   if (G_UNLIKELY(fs_blk_count(value_len) != obj-n_blk)) {
+   applog(LOG_ERR, obj(%s) unexpected blk count 
+  (%u from val sz, %u from hdr),
+  obj-in_fn, fs_blk_count(value_len), obj-n_blk);
+   goto err_out;
+   }
+
+   obj-csum_tbl = malloc(csum_bytes);
+   if (!obj-csum_tbl)
+   goto err_out;
+   obj-csum_tbl_sz = csum_bytes;
+
obj-bo.key = malloc(key_len);
obj-bo.key_len = key_len;
if (!obj-bo.key)
goto err_out;
 
-   /* read object variable-length header */
-   rrc = read(obj-in_fd, obj-bo.key, key_len);
-   if ((rrc != key_len) || (memcmp(key, obj-bo.key, key_len))) {
-   applog(LOG_ERR, read hdr key obj(%s) failed: %s,
+   /* init additional header segment list */
+   iov[0].iov_base = obj-bo.key

[PATCH 0/3] update chunkd checksum verification scheme

2010-07-15 Thread Jeff Garzik

This patchset is part of the work necessary to get ranged-GET (aka
partial GET) working.  As explained in
http://marc.info/?l=hail-develm=127871407125539w=2 the current chunkd
checksum scheme does not work at all for partial retrievals, and must be
revamped.

These patches present step 1 of 4, adding a table of checksums to
chunkd's local on-disk format.

There are no protocol or API changes in this patchset, existing clients
should work fine without any changes.

Nevertheless, this will not be committed to the main branch until
partial retrieval is actually implemented.  I don't commit changes
unless they are actually neeeded.  This checksum table and sendfile
removal work is not required until partial-GET actually exists.

Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] chunkd: remove sendfile(2) support

2010-07-15 Thread Jeff Garzik
commit d663521ba7e6a808be02633e57dbeb7a95973c0f
Author: Jeff Garzik j...@garzik.org
Date:   Thu Jul 15 13:50:10 2010 -0400

chunkd: remove sendfile(2) zero-copy support

chunkd will be soon checksumming data in main memory.  That removes
the utility of a zero-copy interface which bypasses the on-heap
data requirement.

Signed-off-by: Jeff Garzik jgar...@redhat.com

 chunkd/be-fs.c  |   60 
 chunkd/chunkd.h |   14 -
 chunkd/object.c |   31 
 chunkd/server.c |   28 --
 configure.ac|3 --
 5 files changed, 15 insertions(+), 121 deletions(-)

diff --git a/chunkd/be-fs.c b/chunkd/be-fs.c
index f72ed48..5c97388 100644
--- a/chunkd/be-fs.c
+++ b/chunkd/be-fs.c
@@ -25,9 +25,6 @@
 #include sys/stat.h
 #include sys/socket.h
 #include sys/uio.h
-#if defined(HAVE_SYS_SENDFILE_H)
-#include sys/sendfile.h
-#endif
 #include stdlib.h
 #include unistd.h
 #include stdio.h
@@ -52,7 +49,6 @@ struct fs_obj {
 
int in_fd;
char*in_fn;
-   off_t   sendfile_ofs;
 };
 
 struct be_fs_obj_hdr {
@@ -542,62 +538,6 @@ ssize_t fs_obj_write(struct backend_obj *bo, const void 
*ptr, size_t len)
return rc;
 }
 
-#if defined(HAVE_SENDFILE)  defined(__linux__)
-
-ssize_t fs_obj_sendfile(struct backend_obj *bo, int out_fd, size_t len)
-{
-   struct fs_obj *obj = bo-private;
-   ssize_t rc;
-
-   if (obj-sendfile_ofs == 0) {
-   obj-sendfile_ofs += sizeof(struct be_fs_obj_hdr);
-   obj-sendfile_ofs += bo-key_len;
-   }
-
-   rc = sendfile(out_fd, obj-in_fd, obj-sendfile_ofs, len);
-   if (rc  0)
-   applog(LOG_ERR, obj sendfile(%s) failed: %s,
-  obj-in_fn, strerror(errno));
-
-   return rc;
-}
-
-#elif defined(HAVE_SENDFILE)  defined(__FreeBSD__)
-
-ssize_t fs_obj_sendfile(struct backend_obj *bo, int out_fd, size_t len)
-{
-   struct fs_obj *obj = bo-private;
-   ssize_t rc;
-   off_t sbytes = 0;
-
-   if (obj-sendfile_ofs == 0) {
-   obj-sendfile_ofs += sizeof(struct be_fs_obj_hdr);
-   obj-sendfile_ofs += bo-key_len;
-   }
-
-   rc = sendfile(obj-in_fd, out_fd, obj-sendfile_ofs, len,
- NULL, sbytes, 0);
-   if (rc  0) {
-   applog(LOG_ERR, obj sendfile(%s) failed: %s,
-  obj-in_fn, strerror(errno));
-   return rc;
-   }
-
-   obj-sendfile_ofs += sbytes;
-
-   return sbytes;
-}
-
-#else
-
-ssize_t fs_obj_sendfile(struct backend_obj *bo, int out_fd, size_t len)
-{
-   applog(LOG_ERR, BUG: sendfile used but not supported);
-   return -EOPNOTSUPP;
-}
-
-#endif /* HAVE_SENDFILE  HAVE_SYS_SENDFILE_H */
-
 bool fs_obj_write_commit(struct backend_obj *bo, const char *user,
 unsigned char *md, bool sync_data)
 {
diff --git a/chunkd/chunkd.h b/chunkd/chunkd.h
index 1e1b1d3..1e3741a 100644
--- a/chunkd/chunkd.h
+++ b/chunkd/chunkd.h
@@ -48,8 +48,6 @@ enum {
STD_COOKIE_MIN  = 7,
 
STD_TRASH_MAX   = 1000,
-
-   CLI_MAX_SENDFILE_SZ = 512 * 1024,
 };
 
 struct client;
@@ -63,7 +61,6 @@ struct client_write {
uint64_tlen;/* write buffer length */
cli_write_func  cb; /* callback */
void*cb_data;   /* data passed to cb */
-   boolsendfile;   /* using sendfile? */
 
struct list_headnode;
 };
@@ -275,7 +272,6 @@ extern bool fs_obj_delete(uint32_t table_id, const char 
*user,
  const void *kbuf, size_t klen,
  enum chunk_errcode *err_code);
 extern int fs_obj_disable(const char *fn);
-extern ssize_t fs_obj_sendfile(struct backend_obj *bo, int out_fd, size_t len);
 extern int fs_list_objs_open(struct fs_obj_lister *t,
 const char *root_path, uint32_t table_id);
 extern int fs_list_objs_next(struct fs_obj_lister *t, char **fnp);
@@ -330,7 +326,6 @@ extern void applog(int prio, const char *fmt, ...);
 extern bool cli_err(struct client *cli, enum chunk_errcode code, bool 
recycle_ok);
 extern int cli_writeq(struct client *cli, const void *buf, unsigned int buflen,
 cli_write_func cb, void *cb_data);
-extern bool cli_wr_sendfile(struct client *, cli_write_func);
 extern bool cli_rd_set_poll(struct client *cli, bool readable);
 extern void cli_wr_set_poll(struct client *cli, bool writable);
 extern bool cli_cb_free(struct client *cli, struct client_write *wr,
@@ -349,15 +344,6 @@ extern void read_config(void);
 /* selfcheck.c */
 extern int chk_spawn(TCHDB *hdb);
 
-static inline bool use_sendfile(struct client *cli)
-{
-#if defined(HAVE_SENDFILE)  defined(HAVE_SYS_SENDFILE_H)
-   return cli-ssl ? false : true;
-#else

[PATCH 3/3] chunkd: on-disk format stores per-64k checksums

2010-07-15 Thread Jeff Garzik
commit e6fcc02bea062af291148771a59ee2028ae98834
Author: Jeff Garzik j...@garzik.org
Date:   Thu Jul 15 13:57:17 2010 -0400

chunkd: Add checksum table to on-disk format, one sum per 64k of data

Signed-off-by: Jeff Garzik jgar...@redhat.com

 chunkd/be-fs.c |  145 +
 1 file changed, 127 insertions(+), 18 deletions(-)

diff --git a/chunkd/be-fs.c b/chunkd/be-fs.c
index 671c8fd..1bd85ea 100644
--- a/chunkd/be-fs.c
+++ b/chunkd/be-fs.c
@@ -40,6 +40,11 @@
 
 #define BE_FS_OBJ_MAGICCHU1
 
+enum {
+   CHUNK_BLK_ORDER = 16,   /* 64k blocks */
+   CHUNK_BLK_SZ= 1  CHUNK_BLK_ORDER,
+};
+
 struct fs_obj {
struct backend_obj  bo;
 
@@ -49,14 +54,23 @@ struct fs_obj {
 
int in_fd;
char*in_fn;
+
+   size_t  checked_bytes;
+   SHA_CTX checksum;
+   unsigned intcsum_idx;
+   void*csum_tbl;
+   size_t  csum_tbl_sz;
+
+   unsigned intn_blk;
 };
 
 struct be_fs_obj_hdr {
charmagic[4];
uint32_tkey_len;
uint64_tvalue_len;
+   uint32_tn_blk;
 
-   charreserved[16];
+   charreserved[12];
 
unsigned char   hash[CHD_CSUM_SZ];
charowner[128];
@@ -204,6 +218,8 @@ static struct fs_obj *fs_obj_alloc(void)
obj-out_fd = -1;
obj-in_fd = -1;
 
+   SHA1_Init(obj-checksum);
+
return obj;
 }
 
@@ -314,6 +330,17 @@ static bool key_valid(const void *key, size_t key_len)
return true;
 }
 
+static unsigned int fs_blk_count(uint64_t data_len)
+{
+   uint64_t n_blk;
+
+   n_blk = data_len  CHUNK_BLK_ORDER;
+   if (data_len  (CHUNK_BLK_SZ - 1))
+   n_blk++;
+
+   return (unsigned int) n_blk;
+}
+
 struct backend_obj *fs_obj_new(uint32_t table_id,
   const void *key, size_t key_len,
   uint64_t data_len,
@@ -321,6 +348,7 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
 {
struct fs_obj *obj;
char *fn = NULL;
+   size_t csum_bytes;
enum chunk_errcode erc = che_InternalError;
off_t skip_len;
 
@@ -335,6 +363,13 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
return NULL;
}
 
+   obj-n_blk = fs_blk_count(data_len);
+   csum_bytes = obj-n_blk * CHD_CSUM_SZ;
+   obj-csum_tbl = malloc(csum_bytes);
+   if (!obj-csum_tbl)
+   goto err_out;
+   obj-csum_tbl_sz = csum_bytes;
+
/* build local fs pathname */
fn = fs_obj_pathname(table_id, key, key_len);
if (!fn)
@@ -355,7 +390,7 @@ struct backend_obj *fs_obj_new(uint32_t table_id,
obj-out_fn = fn;
 
/* calculate size of front-of-file metadata area */
-   skip_len = sizeof(struct be_fs_obj_hdr) + key_len;
+   skip_len = sizeof(struct be_fs_obj_hdr) + key_len + csum_bytes;
 
/* position file pointer where object data (as in, not metadata)
 * will begin
@@ -391,8 +426,11 @@ struct backend_obj *fs_obj_open(uint32_t table_id, const 
char *user,
struct stat st;
struct be_fs_obj_hdr hdr;
ssize_t rrc;
-   uint64_t value_len;
+   uint64_t value_len, tmp64;
+   size_t csum_bytes;
enum chunk_errcode erc = che_InternalError;
+   struct iovec iov[2];
+   size_t total_rd_len;
 
if (!key_valid(key, key_len)) {
*err_code = che_InvalidKey;
@@ -447,25 +485,49 @@ struct backend_obj *fs_obj_open(uint32_t table_id, const 
char *user,
}
 
/* verify object key length matches input key length */
-   if (GUINT32_FROM_LE(hdr.key_len) != key_len)
+   if (G_UNLIKELY(GUINT32_FROM_LE(hdr.key_len) != key_len))
goto err_out;
 
-   /* verify file size large enough to contain value */
value_len = GUINT64_FROM_LE(hdr.value_len);
-   if ((st.st_size - sizeof(hdr) - key_len)  value_len) {
+   obj-n_blk = GUINT32_FROM_LE(hdr.n_blk);
+   csum_bytes = obj-n_blk * CHD_CSUM_SZ;
+
+   /* verify file size large enough to contain value */
+   tmp64 = value_len + sizeof(hdr) + key_len + csum_bytes;
+   if (G_UNLIKELY(st.st_size  tmp64)) {
applog(LOG_ERR, obj(%s) unexpected size change, obj-in_fn);
goto err_out;
}
 
+   /* verify expected size of checksum table */
+   if (G_UNLIKELY(fs_blk_count(value_len) != obj-n_blk)) {
+   applog(LOG_ERR, obj(%s) unexpected blk count 
+  (%u from val sz, %u from hdr),
+  obj-in_fn, fs_blk_count(value_len), obj-n_blk);
+   goto err_out;
+   }
+
+   obj-csum_tbl = malloc(csum_bytes

Re: New 'hail' repository created, with major packaging rework

2010-07-07 Thread Jeff Garzik

On 07/06/2010 11:24 AM, Pete Zaitcev wrote:

On Mon, 05 Jul 2010 15:22:40 -0400
Jeff Garzikj...@garzik.org  wrote:


Moving libhttpstor is now a simple matter of simultaneous commits to
hail.git and tabled.git, moving the code and updating build machinery.


BTW, I suggest we do it differently: rename the functions and
the struct httpstor as they are introduced in libhail (without
changing anything else, to prevent accidential regressions).
This way, tabled and our out-of-tree tests can continue to build
for a couple of days and smoothly switch over to new libraries.


OK, just pushed the following out to hail.git.  If people disagree with 
naming, now's the time to speak up.



commit 5188f48dd3c73ce86f2bc453a326ee0bf40fd6db
Author: Jeff Garzik j...@garzik.org
Date:   Wed Jul 7 02:16:28 2010 -0400

libhail: Import httpstor, httputil modules from tabled

With the following transformations:

s/req_/hreq_/
s/httpstor_/hstor_/
s//huri_/
s//hutil_/

Signed-off-by: Jeff Garzik jgar...@redhat.com


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tabled: use httpstor API from libhail

2010-07-07 Thread Jeff Garzik
On Wed, Jul 07, 2010 at 06:38:22PM -0400, Jeff Garzik wrote:
 Just committed the following to tabled.git on my local laptop, on a side
 branch.  This won't be pushed onto the main tabled branch until Friday,
 to give people time to convert as zaitcev suggested in the 'new hail
 repository' thread.

This has now been pushed, as branch 'libhail-merge'.

Branch master, aka the main trunk, remains untouched until Friday (unless
some critical tabled issue arises before then, of course).


 As a side note, this requires a couple hail.git commits that will be
 pushed to upstream hail.git from my local laptop in a couple hours
 (movement of uri_parse from tabled's libhttpstor into libhail), so
 you'll need to update hail.git before being able to use the patch below.

These hail.git commits have now been pushed:

commit c7b833069e28cf9bddb69f46bb5e09138ab4984d
libhail: add huri_parse API (imported from tabled)

commit 55b0c57ca8f2b6beecea5a4680d76f45a7c32c28
Fix .gitignore issue causing test/chunkd/ to be completely ignored.

commit 5188f48dd3c73ce86f2bc453a326ee0bf40fd6db
libhail: Import httpstor, httputil modules from tabled

Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] chunkd: add cp command, for local intra-table copies

2010-07-06 Thread Jeff Garzik

On 07/06/2010 11:17 AM, Pete Zaitcev wrote:

On Tue, 6 Jul 2010 03:24:29 -0400
Jeff Garzikj...@garzik.org  wrote:


The following patch, against current hail.git, adds the CP command to
chunkd, permitting copying from object-object inside a single table.


What is it for?


Fun!  :)

More seriously, it is mainly an infrastructure patch, adding things that 
the upcoming RCP command will use.  As CP is far less complex, this 
allows me to verify several bits of machinery before moving forward.  I 
imagine CP will be tangentially helpful, but not a crucial feature in 
and of itself.


Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] chunkd: add cp command, for local intra-table copies

2010-07-06 Thread Jeff Garzik

On 07/06/2010 11:17 AM, Pete Zaitcev wrote:

On Tue, 6 Jul 2010 03:24:29 -0400
Jeff Garzikj...@garzik.org  wrote:


The following patch, against current hail.git, adds the CP command to
chunkd, permitting copying from object-object inside a single table.


What is it for?


Here's a real-world example.

Quoting from the S3 documentation, this describes the PUT (copy) 
operation, something that tabled does not yet support, but should:


This implementation of the PUT operation creates a copy of an
object that is already stored in Amazon S3. A PUT copy
operation is the same as performing a GET and then a PUT.
Adding the request header, x-amz-copy-source, makes the PUT
operation copy the source object into the destination bucket.

Assuming that a given tabled object is already fully replicated -- 
HOPEFULLY the common case for us -- the least expensive way to implement 
this is


for each chunkd containing object OLD_KEY

CHO_CP(object OLD_KEY - object NEW_KEY)

Assuming each chunkd node has the necessary free space, this method 
totally avoids using network bandwidth, when creating a copy of an object


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tabled: Some Amazon S3 features to consider

2010-07-06 Thread Jeff Garzik


Here are a few interesting things that have appeared in the S3 API since 
its initial release:


1) Object versioning.  All objects now uniquely identified by (key, 
version) pair.  API compatibility is maintained by supporting the notion 
of current version.


2) Object copying.  Rather than an expensive S3-client-S3 round-trip, 
you may supply the x-amz-copy-source header to the PUT operation, 
causing S3 to use an existing object's data as the source for the PUT.


3) Reduced redundancy.  x-amz-storage-class header may used to specify 
normal durability (STANDARD) or reduced durability (REDUCED_REDUNDANCY).


4) Regions (localization).  Bucket locations may be set.  Project Hail 
services have some notion of location as well.  See if we can match up 
the two...


5) POST HTTP method.  POST is like PUT, but can be used directly from a 
browser.



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] chunk: add CP operation

2010-07-06 Thread Jeff Garzik

This patch

* adds local, intra-table copy operation to chunkd/libhail
* illustrates what files need updating, when adding a new op to chunk
* adds some 'worker' infrastructure which should help with future ops,
  notably remote copy (RCP)
* should assist tabled's implementation of S3 copy (x-amz-copy-source)

 chunkd/chunkd.h |   19 +++
 chunkd/object.c |  117 ++
 chunkd/server.c |  122 
 doc/chcli.8 |   13 -
 include/chunk_msg.h |2 
 include/chunkc.h|   10 +++
 lib/chunkdc.c   |   56 ++
 test/chunkd/Makefile.am |5 +
 tools/chcli.c   |   77 ++
 9 files changed, 409 insertions(+), 12 deletions(-)

diff --git a/chunkd/chunkd.h b/chunkd/chunkd.h
index e019f0d..5d39353 100644
--- a/chunkd/chunkd.h
+++ b/chunkd/chunkd.h
@@ -104,6 +104,8 @@ struct client {
unsigned intreq_used;   /* amount of req_buf in use */
void*req_ptr;   /* start of unexamined data */
uint16_tkey_len;
+   unsigned intvar_len;/* len of vari len record */
+   boolsecond_var; /* inside 2nd vari len rec? */
 
char*hdr_start; /* current hdr start */
char*hdr_end;   /* current hdr end (so far) */
@@ -124,6 +126,7 @@ struct client {
charnetbuf_out[CLI_DATA_BUF_SZ];
charkey[CHD_KEY_SZ];
chartable[CHD_KEY_SZ];
+   charkey2[CHD_KEY_SZ];
 };
 
 struct backend_obj {
@@ -162,6 +165,14 @@ struct volume_entry {
char*owner; /* obj owner username */
 };
 
+struct worker_info {
+   enum chunk_errcode  err;/* error returned to pipe */
+   struct client   *cli;   /* associated client conn */
+
+   void(*thr_ev)(struct worker_info *);
+   void(*pipe_ev)(struct worker_info *);
+};
+
 struct server_stats {
unsigned long   poll;   /* number polls */
unsigned long   event;  /* events dispatched */
@@ -209,6 +220,10 @@ struct server {
 
GHashTable  *fd_info;
 
+   GThreadPool *workers;   /* global thread worker pool */
+   int max_workers;
+   int worker_pipe[2];
+
struct list_headwr_trash;
unsigned inttrash_sz;
 
@@ -278,6 +293,7 @@ extern int fs_obj_do_sum(const char *fn, unsigned int klen, 
char **csump);
 extern bool object_del(struct client *cli);
 extern bool object_put(struct client *cli);
 extern bool object_get(struct client *cli, bool want_body);
+extern bool object_cp(struct client *cli);
 extern bool cli_evt_data_in(struct client *cli, unsigned int events);
 extern void cli_out_end(struct client *cli);
 extern void cli_in_end(struct client *cli);
@@ -314,12 +330,15 @@ extern bool cli_err(struct client *cli, enum 
chunk_errcode code, bool recycle_ok
 extern int cli_writeq(struct client *cli, const void *buf, unsigned int buflen,
 cli_write_func cb, void *cb_data);
 extern bool cli_wr_sendfile(struct client *, cli_write_func);
+extern bool cli_rd_set_poll(struct client *cli, bool readable);
 extern void cli_wr_set_poll(struct client *cli, bool writable);
 extern bool cli_cb_free(struct client *cli, struct client_write *wr,
bool done);
 extern bool cli_write_start(struct client *cli);
 extern int cli_req_avail(struct client *cli);
 extern int cli_poll_mod(struct client *cli);
+extern bool worker_pipe_signal(struct worker_info *wi);
+extern bool tcp_cli_event(int fd, short events, void *userdata);
 extern void resp_init_req(struct chunksrv_resp *resp,
   const struct chunksrv_req *req);
 
diff --git a/chunkd/object.c b/chunkd/object.c
index 116792f..af187b6 100644
--- a/chunkd/object.c
+++ b/chunkd/object.c
@@ -25,6 +25,7 @@
 #include unistd.h
 #include string.h
 #include errno.h
+#include poll.h
 #include stdio.h
 #include syslog.h
 #include glib.h
@@ -356,3 +357,119 @@ start_write:
return cli_write_start(cli);
 }
 
+static void worker_cp_thr(struct worker_info *wi)
+{
+   static const unsigned bufsz = (1 * 1024 * 1024);
+   void *buf = NULL;
+   struct client *cli = wi-cli;
+   struct backend_obj *obj = NULL, *out_obj = NULL;
+   enum chunk_errcode err = che_InternalError;
+   unsigned char md[SHA_DIGEST_LENGTH];
+   char hashstr[50];
+
+   buf = malloc(bufsz);
+   if (!buf)
+   goto out;
+
+   cli-in_obj = obj = fs_obj_open(cli-table_id, cli-user, cli-key2,
+   cli-var_len, err);
+   if 

stor_obj_test

2010-07-06 Thread Jeff Garzik
This function seems to be missing the meat.  It retrieves then 
disposes of a keylist.


bool stor_obj_test(struct open_chunk *cep, uint64_t key)
{
struct st_keylist *klist;

if (!cep-stc)
return false;

klist = stc_keys(cep-stc);
if (!klist)
return false;
stc_free_keylist(klist);
return true;
}


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


chunkd on-disk and network protocol format change

2010-07-06 Thread Jeff Garzik


The following commit introduces an incompatible chunkd change, which 
breaks compatibility with (a) existing on-disk chunkd databases, and (b) 
existing chunkd network protocol entities.


Prior to commit ea5d20bc22aeed077312c9c1824e84651af17a16, chunkd stored 
SHA1 checksums as ASCII, and sent them across the wire in each message 
in ASCII.


Converting these to directly store and use SHA1 binary checksums on-disk 
saves several memory allocations, and more importantly, shaves 44 bytes 
off each chunkd message.  ASCII is only needed in the XML-based 
list-objects output, so we only perform the conversion at list-objects time.


Jeff



commit ea5d20bc22aeed077312c9c1824e84651af17a16
Author: Jeff Garzik j...@garzik.org
Date:   Wed Jul 7 00:51:48 2010 -0400

[chunk] protocol, disk fmt: Replace ASCII checksum representation 
with binary


Rather than converting SHA1 checksums back and forth between ASCII
and binary, always store and compare binary checksums.  Only convert
to ASCII when performing a list-objects request, which requires
XML output.

Among other savings, this decreases the size of the per-message
fixed-length header by 44 bytes.

Signed-off-by: Jeff Garzik jgar...@redhat.com

 chunkd/be-fs.c | 47 +++
 chunkd/chunkd.h|  9 +
 chunkd/object.c| 14 --
 chunkd/selfcheck.c | 19 +++
 include/chunk_msg.h|  4 ++--
 5 files changed, 37 insertions(+), 56 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New 'hail' repository created, with major packaging rework

2010-07-05 Thread Jeff Garzik

On 07/05/2010 03:13 PM, Pete Zaitcev wrote:

On Fri, 02 Jul 2010 02:59:20 -0400
Jeff Garzikj...@garzik.org  wrote:


git://git.kernel.org/pub/scm/daemon/distsrv/hail.git



libhail is a single shared library binary, linking together cldc, ncld,
libtimer, and chunkdc modules.  In other words, libhail at present is a
simplistic combination of cld/lib and chunkd/lib.


[zait...@lembas hail-tip]$ ls lib include
include:
chunkc.h chunksrv.hcld-private.h  Makefile ncld.h
chunk_msg.h  cldc.helist.hMakefile.am  objcache.h
chunk-private.h  cld_common.h  hail_log.h Makefile.in

lib:
chunkdc.c   cldc-udp.c libhail.pc.in  Makefile
chunksrv.c  cld_msg_rpc.x  libhail-uninstalled.pc Makefile.am
cldc.c  common.c   libhail-uninstalled.pc.in  Makefile.in
cldc-dns.c  libhail.pc libtimer.c pkt.c
[zait...@lembas hail-tip]$ grep httpstor lib/*.c
[zait...@lembas hail-tip]$

What has happened to the plan to include httpstor into libhail?


Still planned, and can easily be done.  Important first step was getting 
the foundation laid -- creating hail.git, and synchronizing hail.git and 
tabled.git, and associated RPM packaging.


Moving libhttpstor is now a simple matter of simultaneous commits to 
hail.git and tabled.git, moving the code and updating build machinery.


I can release a hail 0.7.1 and tabled 0.5.1 with this change, if you 
feel versioning and pushing out this libhttpstor change is highly important.


(or you can do that yourself, doesn't make a difference to me)

Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


chunkd near-term enhancements

2010-07-04 Thread Jeff Garzik


Here are a few chunkd enhancements that are currently on my drawing 
board, for the near term:


(CHO_xxx denotes new chunkd network protocol commands, as listed in 
include/chunk_msg.h)


* CHO_SET_SERVERS:
chunkd shall maintain a per-connection buffer known as SERVER_LIST. 
This chunkd command is issued by the client prior to using a 
SERVER_LIST-related command (see below), to reset the contents of the 
connection's SERVER_LIST buffer.


* CHO_RCP:
copy a single object to each remote server in SERVER_LIST

* CHO_PUT_THRU:
like PUT, but causes chunkd to further replicate the incoming object to 
each remote server in SERVER_LIST


* CHO_APPEND:
append data onto an object.

* CHO_APPEND_THRU:
append data locally, and, replicate foreach remote server in SERVER_LIST

The authentication used in chunkd-chunkd connections is the logged-in 
username/shared-secret combination.


Jeff





--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New 'hail' repository created, with major packaging rework

2010-07-02 Thread Jeff Garzik


A new git repository

git://git.kernel.org/pub/scm/daemon/distsrv/hail.git

was created, preserving the full histories of cld.git and chunkd.git. 
The existing cld.git and chunkd.git repositories have been left 
untouched, for now.  I also have not yet updated tabled.git for this new 
work, though it should be an easy matter of linking against libhail 
rather than other libs.


This new repository creates hail-$VERSION.tar.gz tarballs via make 
distcheck, producing libhail, cld and chunkd binaries.


libhail is a single shared library binary, linking together cldc, ncld, 
libtimer, and chunkdc modules.  In other words, libhail at present is a 
simplistic combination of cld/lib and chunkd/lib.


The RPM package specfile has been updated (pkg/hail.spec) to generate 
the following complement of packages on Fedora:


Wrote: /garz/rpm/SRPMS/hail-0.7-0.1.gc69acd63.fc12.src.rpm
Wrote: /garz/rpm/RPMS/x86_64/hail-0.7-0.1.gc69acd63.fc12.x86_64.rpm
- contains libhail
Wrote: /garz/rpm/RPMS/x86_64/hail-cld-0.7-0.1.gc69acd63.fc12.x86_64.rpm
- contains cld
Wrote: /garz/rpm/RPMS/x86_64/hail-chunkd-0.7-0.1.gc69acd63.fc12.x86_64.rpm
- contains chunkd
Wrote: /garz/rpm/RPMS/x86_64/hail-devel-0.7-0.1.gc69acd63.fc12.x86_64.rpm
- contains libhail devel libs, headers
Wrote: 
/garz/rpm/RPMS/x86_64/hail-debuginfo-0.7-0.1.gc69acd63.fc12.x86_64.rpm



rpmlint still issues several warnings about hail-cld and hail-chunkd 
packages.  That must be fixed before this package suite rename can be 
submitted to Fedora (pkg renames must be submitted as new packages, and 
go through the pkg review process all over again).


To produce hail*.rpm packages on Fedora, I would do something like this:

1) set up rpm build directories (== $RBD in this example)
2) git clone git://git.kernel.org/pub/scm/daemon/distsrv/hail.git
3) cd hail
4) ./autogen.sh
5) ./autogen.sh
6) ./configure
7) make -s dist
8) cp *.tar.gz pkg/*.init pkg/*.sysconf $RBD/SOURCES
9) cp pkg/hail.spec $RBD/SPECS
10) cd $RBD
11) rpmbuild -ba SPECS/hail.spec


As mentioned above, the {cld,chunkd}.git repositories have been left 
untouched, so if something goes wildly wrong with this scheme, we can 
easily backtrack.


Comments welcome.

Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hail version 0.7 released

2010-07-02 Thread Jeff Garzik


Version 0.7 of hail core services has been released, at the expected places:

http://www.kernel.org/pub/software/network/distsrv/hail/
ftp://ftp.kernel.org/pub/software/network/distsrv/hail/
git://git.kernel.org/pub/scm/daemon/distsrv/hail.git

This release replaces separate chunkd and cld packages with a single 
'hail' package, which provides libhail, cld and chunkd binaries.


Release notes (from the NEWS file), showing changes since the last 
official cld/chunkd releases:


- cld and chunkd merged into single 'hail' package, providing
  libhail, cld and chunkd binaries.  libcldc and libchunkdc libraries
  no longer exist.
- cld: bug fixes
- cld: use XDR for all messages
- cldc: bug fixes
- cldc: improve verbose output
- cldc: add new 'ncld' client API
- add experimental 'cldfuse' FUSE filesystem
- support db 4.9, 5.0
- chunkd: bug fixes
- chunkd: update to ncld, fix CLD-related bugs
- chunkd: improve and canonicalize verbosity controls and output
- chunkd: be less inflexible about CLD paths
- chunkd: (protocol change) replace SSL/no-SSL split ports with
  in-band SSL negotiation
- chunkd: integrity self-checking
- chunkd: fix GET/PUT for larger than 2GB values
- chcli: bug fixes

As with prior cld/chunkd releases, there will be no attempt at backwards 
compatibility, API freeze or protocol freeze until just prior to 1.0 
release.  In this release, backwards incompatible cld and chunkd network 
protocol changes have occurred.


Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tabled version 0.5 released

2010-07-02 Thread Jeff Garzik


Coinciding with hail core v0.7 release is this tabled release, v0.5, at 
the usual places:


git://git.kernel.org/pub/scm/daemon/distsrv/tabled.git
http://www.kernel.org/pub/software/network/distsrv/tabled/
ftp://ftp.kernel.org/pub/software/network/distsrv/tabled/

Release notes:

- update for hail v0.7, newly combined from cld+chunkd packages
- reduce CLD client verbosity
- check for db 4.8, 4.9, 5.0
- use new ncld API internally
- background replication thread
- config: add Group (one cell), drop StorageNode
- add new tests
- fixes for many serious bugs and crashes


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [tabled patch 1/1] Stagger the start-daemon

2010-07-01 Thread Jeff Garzik

On 06/30/2010 10:49 AM, Pete Zaitcev wrote:

My rule of thumb is that magic delays are evil or stupid, so I worked on
eliminating them from our scripts. However, in this case it's just not
worth it, because the result is that we have to wait way more than 100s
for several cycles of CLD timeouts to complete, not just one, before we
declare a failure. With this patch, all builds completed that I submitted
to Fedora build system.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  test/start-daemon  |4 
  test/wait-for-listen.c |7 ++-
  2 files changed, 6 insertions(+), 5 deletions(-)


applied


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] new structure: hail pkg instead of cld, chunkd

2010-06-29 Thread Jeff Garzik


I've been thinking about a new structure for the projects, namely having 
a single hail or hail-core package, that includes cld and chunkd 
services, and associated client libraries inside a new libhail.


In real terms, it would look like this:

cld - hail
libcldc - libhail, libhail-devel
chunkd  - hail
libchunkdc  - libhail, libhail-devel

tabled  - tabled (no change)
libhttpstor - libhail, libhail-devel
itd - itd (no change)
nfs4d   - nfs4d (no change)

Core services (cld, chunkd), their associated client libs (libcldc, 
libchunkdc), and other useful common routines (libhttpstor) would find a 
new home in the hail RPM, providing cld, chunkd and libhail.


tabled, itd and nfs4d are consider hail applications, and live in their 
own separate packages, BuildRequire-ing the core hail packages.


I think this new organization will be more useful to both developers and 
future users.  For developers, changing the core services, and packaging 
commonly reused routines is easier.  For users, the core services and 
application separation is more clear, IMO easier to understand at a glance.


Comments?

Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Metadata replication in tabled

2010-06-25 Thread Jeff Garzik

On 06/24/2010 08:31 PM, Pete Zaitcev wrote:

I worked on fixing the metadata replication in tabled. There were some
difficulties in existing code, in particular the aliasing between the
hostname used to identify nodes and the hostname used in bind() for
listening was impossible to work around in repmgr. In the end I gave
up on repmgr and switched tabled to the Base API. So, the replication
works now... for some values of works, which is still a progress.

We essentially have a tabled that can really be considered as replicated.
Before, it was only data replication, which was great and all but
useless against disk failues in the tabled's database. I think it's
a major treshold for tabled.


er, huh?  In addition to data replication, we already have metadata 
replication via db4 repmgr in tabled.git, which ensures metadata db 
integrity in the case of disk or tabled node failure.


The core problem with current tabled.git is that S3 clients expect all 
nodes to support PUT/DELETE as well as GET.  Our current use w/ db4 
slave mode does not fulfill this client requirement.


Your work here, moving to the base replication API, eliminates several 
obstacles on the path to making all tabled nodes support PUT/DELETE. 
But it is not true to say that metadata replication did not exist prior 
to this patch.


With either repmgr or base API, we still need to make failover more 
transparent to our S3 clients.




Unfortunately, the code is rather ugly. I tried to create a kind
of an optional replication layer, so that tdbadm could be built
without it. Although I succeeded, the result is a hideous mess of
methods and callbacks, functions with side effects, and a bunch
of poorly laid out state machines. In places I cannot wrap my own
head around what's going on without a help of pencil and paper.

So, while working, it's not ready for going in. Still, I'm going
to throw it here in case I get hit by a bus, or if anyone wants
an example of using db4 replication early.


Based on a quick read, it seems straightforward, and looks like 
something I can try tomorrow...


Very excited to try this :)

Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Zookeeper instead of CLD in Hail

2010-06-07 Thread Jeff Garzik

On 06/04/2010 11:27 PM, Pete Zaitcev wrote:

I heard people say they cribbed from the same Chubby paper, but
it's bollocks. It's absolutely nothing like what Chubby implies.
No locks for one thing. To be sure, Zookeeper provides a canned
piece of code which implements locks, kinda like you can implement
compare-and-swap using Dekker's algorithm on a CPU that doesn't
have it. The canned lock creates sequenced files (using a ZK
server call that creates unique filenames), then sets some
watches (same as CLD offers), then re-reads the directory to
find the lowest number sequential file, which is the winner of
the lock. Haha, only serious. I tested it, it works, but ew.


Yeah, the main similarity is...  both ZK and CLD offer some type of 
filesystem (with all that implies).  ZK is IMO not much like Chubby at 
all, in terms of focus / design goals.


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [chunkd patch 4/6] Print client port

2010-05-25 Thread Jeff Garzik

On 05/21/2010 12:54 AM, Pete Zaitcev wrote:

-   host, sizeof(host), NULL, 0, NI_NUMERICHOST);
+   host, sizeof(host), port, sizeof(port), NI_NUMERICHOST);
host[sizeof(host) - 1] = 0;
-   applog(LOG_INFO, client %s connected%s, host,
+   host[sizeof(port) - 1] = 0;



You truncate the wrong variable.

--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [chunkd patch 1/6] Fix the leak of suddenly closed connections

2010-05-25 Thread Jeff Garzik

On 05/21/2010 12:54 AM, Pete Zaitcev wrote:

After a period of uptime, chunkd may stop working with this:

May 20 08:51:47 azdragon2 chunkd[4034]: tcp accept: Too many open files

An examination with lsof shows that file descriptors for sockets and
object data files are leaked in neat pairs. As it turns out, the root
cause is not processing the case when tabled opens a connection to
read an object, then closes it before the data is transferred.
On some systems, sendfile returns no error in such case, but the
amount of data that it attempted to send before it recognized that
the socket was closed. If that happens, chunkd will not receive a
POLLOUT indication and the struct cli will linger forever with
non-empty write queue.

The fix has two parts:

  1. Permit a client in evt_recycle state to process outstanding
 writes in the same manner a client in evt_dispose does.

 Note that in our specific failure case no actual processing
 is going to occur, so this part has an effect of permitting
 the dispatch to work. If we do not do this, a POLLIN may
 throw us into the evt_read_fixed stage.

  2. Once we're getting dispatched, dispose of clients that
 had connections closed, using the unmaskable POLLHUP bit.

As an aside, tabled 0.5-0.7.x resets the connections when Firefox
asks for a file that was modified after a certain date. In that case,
tabled wants to know when the file was modified, so it reads the
header off chunkd. If it turns out that the client is not interested
in the data, tabled simply closes the connection without reading
whatever data has arrived. This may change in the future, but the
bug in chunkd should be fixed anyway, for general robustness.

Signed-off-by: Pete Zaitcevzait...@redhat.com


applied 1-6, after fixing truncation bug newly introduced


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [chunkd patch 6/6] Make cli_wr_set_poll bool

2010-05-25 Thread Jeff Garzik

On 05/21/2010 12:54 AM, Pete Zaitcev wrote:

The upside of this cleanup is an ease of reading and evaluating with
fewer control paths.

[This patch will only work if patch 2/6 is applied. Sorry.]

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  server/chunkd.h |2 +-
  server/object.c |3 +--
  server/server.c |   18 +-
  3 files changed, 7 insertions(+), 16 deletions(-)


ITYM Make cli_wr_set_poll void


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [tabled patch 1/1] fix the selection of chunk

2010-05-25 Thread Jeff Garzik

On 05/25/2010 11:30 PM, Pete Zaitcev wrote:

If a chunkserver goes down, tabled sometimes throws a phantom object
not found. It happens because we keep hitting the same down node and
exhaust the retries. The existing code calls rand() every time and
hopes for the best, but this is too likely to end poorly.

The fix is to only randomize once before the retry loop, and then
cycle through all available nodes deterministically. The same fix
would apply even if we used a better technique to select an available
chunkserver than just random.

Also, we refactor the code just a little bit, so that the enormous
function object_get_body gets somewhat easier to follow.

Signed-off-by: Pete Zaitcevzait...@redhat.com


applied


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iSCSI front-end for Hail

2010-05-06 Thread Jeff Garzik


As of itd commit 196e8f317fc7202460d7adde93dac939caf23f5d, the iSCSI 
target daemon appears to survive stress tests, and does not leak memory. 
 I call that a good first milestone.


Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iSCSI front-end for Hail

2010-05-05 Thread Jeff Garzik
As of commit 23a5795e3ca555a6454b199e071482bb50655508, itd is passing 
integrity and stress tests from two test suites, iscsi-harness found in 
netbsd-iscsi pkg, and basic blkdev integrity tests using dd(1).


There is a whopping big memory leak that needs fixing, but the basics 
appear to be working.


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [cld patch 1/1] use specified username in cldcli

2010-05-02 Thread Jeff Garzik

On 05/03/2010 12:07 AM, Pete Zaitcev wrote:

I suspect I copy-pasted over it when I converted to ncld, but anyhow this
patch seems work and do what's expected for --user flag.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  tools/cldcli.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/cldcli.c b/tools/cldcli.c
index 7c73091..79a1009 100644
--- a/tools/cldcli.c
+++ b/tools/cldcli.c
@@ -712,7 +712,7 @@ int main (int argc, char *argv[])
dr = host_list-data;

nsess = ncld_sess_open(dr-host, dr-port,error, sess_event, NULL,
-cldcli, cldcli,cli_log);
+our_user, our_user,cli_log);
if (!nsess) {
if (error  1000) {


applied

PS. you sent this to j...@garzik.com, and unfortunately, I don't own 
that domain.  I should.  :)  For future reference, jgar...@pobox.com or 
j...@garzik.org are equivalent and should exist for the long term (even 
if I get fired from Red Hat or somesuch :)).


Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [chunkd patch 1/2] eradicate last vestiges of libevent

2010-05-01 Thread Jeff Garzik

On 05/01/2010 12:51 AM, Pete Zaitcev wrote:

We stopped using libevent in Chunk a while ago, but for some reason
not all references were removed. I tested this patch by building
on a fresh Fedora 13 system without libevent.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  configure.ac   |3 ---
  pkg/chunkd.spec|2 +-
  server/Makefile.am |2 +-
  3 files changed, 2 insertions(+), 5 deletions(-)


applied 1-2

Thanks for updating the email subject lines.

Jeff




--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


iSCSI front-end for Hail

2010-05-01 Thread Jeff Garzik

Hail devs,

Project Hail was, in part, conceived as an umbrella of libraries and 
services enabling the mating of a well known, Internet-standard API with 
a back-end that enables distributed storage.  tabled is an example of 
this:  it provides an application front-end compatible with S3 API, 
using Hail back-end services chunkd and CLD.


nfs4d[1] is a second, work-in-progress example.  nfs4d is a fully 
working NFSv4 front-end, waiting to be mated to the Hail back-end services.


A third example is something I poked at long ago, iSCSI.  The vinzvault 
announcement[2] got me thinking about the iSCSI target[3] daemon that I 
had worked on, a while ago.  vinzvault, sheepdog, DST, drbd, nbd and 
iSCSI all attempt to provide remote network attached storage, usually 
for storage on ephemeral virtual machines, similar to Amazon's Elastic 
Block Storage (EBS) on their EC2 grid.


I dusted off my itd (iSCSI target daemon) project, fixed a bunch of 
bugs, and got it working[4] in the hopes that this might be useful to 
Hail or vinzvault or so.


itd is a remote iSCSI service exporting one or more slices of storage as 
a standard SCSI device on your system.  It is based off of 
'netbsd-iscsi' in Fedora, which is in turn based off an old, open source 
Intel codebase.  netbsd-iscsi seemed a more pliable codebase than the 
very-nice SCSI TGT project[5].


The web browsable itd tree (with git:// URL for cloning) can be found at 
http://git.kernel.org/?p=daemon/distsrv/itd.git


As I write this email, I am borrowing a lot of networking code from 
tabled, to convert from GNet over to the more-flexible TCP server 
codebase found in tabled -- notably the asynchronous background TCP 
writing code in tabled.  Hopefully will finish and commit this by the 
end of the weekend.


At that point, itd should be a fully compliant SCSI target, capable of 
reading/writing -- to a pre-allocated RAM space.  Once that milestone is 
reached, the RAM storage may be replaced with Hail components, or other 
gadgets like MongoDB[6], to provide scalable, distributed storage.


Jeff


[1] https://hail.wiki.kernel.org/index.php/Nfs4d
[2] http://www.mail-archive.com/linux-clus...@redhat.com/msg08555.html
[3] a SCSI target is a remote network server, in SCSI parlance.  It is 
mated with an initiator, which is SCSI's term for client.
[4] well, only small WRITEs work at the moment.  but READ is fully 
working at high speeds.

[5] http://stgt.sourceforge.net/
[6] http://www.mongodb.org/

--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 09/12] tabled: drop double prefixing

2010-04-18 Thread Jeff Garzik

On 04/18/2010 12:42 AM, Pete Zaitcev wrote:

On Fedora 14, the following is seen in syslog:

Apr 17 19:58:52 niphredil tabled: tabled: connecting to site
  hitlain.zaitcev.lan:8083: No route to host
Apr 17 19:58:56 niphredil tabled: tabled: DB_ENV-rep_elect:WARNING:
  nvotes (1) is sub-majority with nsites (2)

Drop the extra prefix, it only wastes screen space.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  lib/tdb.c |7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)


applied 9-12


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/8] CLD: cleanup: add cld_msg_rpc.x

2010-04-17 Thread Jeff Garzik

On 04/16/2010 10:18 PM, Pete Zaitcev wrote:

On Wed, 14 Apr 2010 15:55:01 -0400
Jeff Garzikj...@garzik.org  wrote:


+++ b/lib/Makefile.am
@@ -27,6 +27,7 @@ libcldc_la_SOURCES=   \
common.c\
libtimer.c  \
pkt.c   \
+   cld_msg_rpc.x   \
cld_msg_rpc_xdr.c


that's quite strange, because I built an official rawhide copy just fine
without this...


Strange indeed, I re-checked and it went away now. Oh well.


I wonder if it's a problem with the 'clean' functionality.  The 
EXTRA_DIST line contains a list of things forced to be included in the 
tarball, typically used for things not contained in *_SOURCES.  AFAICT 
from the autoconf/automake docs, that is where sources for generated 
sources[1] should reside.  So I still wonder how it disappeared for you...


Jeff




[1] Brought to you by the Department of Redundant Redundancies


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Trivial Q about chunkd's main_loop

2010-04-17 Thread Jeff Garzik

On 04/17/2010 09:36 PM, Pete Zaitcev wrote:

Is there a reason why the main_loop in chunkd uses a naked
g_hash_table_lookup instead of srv_poll_lookup? Performance?

@@ -1681,8 +1681,7 @@ static int main_loop(void)

fired++;

-   sp = g_hash_table_lookup(chunkd_srv.fd_info,
-   GINT_TO_POINTER(pfd-fd));
+   sp = srv_poll_lookup(pfd-fd);
if (G_UNLIKELY(!sp)) {


Looks like it should be changed to call srv_poll_lookup(), indeed. 
srv_poll_lookup() is marked 'static', so there should not be any 
performance difference after the compiler's optimizer passes get 
finished with it.


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tabled RPM build fails before it succeeds

2010-04-16 Thread Jeff Garzik

The same source, same spec.

Build #1 (fails on x86_64):
http://koji.fedoraproject.org/koji/taskinfo?taskID=2119825

Build #2 (fails on i686):
http://koji.fedoraproject.org/koji/taskinfo?taskID=2120174

Build #3 (success on all platforms):
http://koji.fedoraproject.org/koji/taskinfo?taskID=2120215

--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 2/8] CLD: cleanup: add a log entry about sent packet

2010-04-14 Thread Jeff Garzik

On 04/14/2010 02:34 PM, Pete Zaitcev wrote:

Currently, there's nothing in the verbose output about sent packets
at all. No, really! This is very confusing, even if I run tcpdump
in the same time. I think we should add this.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  lib/cldc.c |2 ++
  1 file changed, 2 insertions(+)


applied 2-6


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 7/8] tabled: cleanup: add #include

2010-04-14 Thread Jeff Garzik

On 04/14/2010 02:35 PM, Pete Zaitcev wrote:

Same as everywhere else: missing prototypes, so implementations are
not actually matched by the compiler.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  lib/readport.c |1 +
  test/libtest.c |1 +
  2 files changed, 2 insertions(+)


applied 7-8


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/8] CLD: cleanup: add cld_msg_rpc.x

2010-04-14 Thread Jeff Garzik

On 04/14/2010 02:33 PM, Pete Zaitcev wrote:

You know what's weird... Without this, I cannot build an RPM at all,
the rpmbuild complains about unpackaged files and aborts. But
everyone else seems to have no problem? Strange. BTW, I am on
Fedora 14.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  lib/Makefile.am |1 +
  1 file changed, 1 insertion(+)

diff --git a/lib/Makefile.am b/lib/Makefile.am
index ea72426..012d558 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -27,6 +27,7 @@ libcldc_la_SOURCES=   \
common.c\
libtimer.c  \
pkt.c   \
+   cld_msg_rpc.x   \
cld_msg_rpc_xdr.c


that's quite strange, because I built an official rawhide copy just fine 
without this...


Maybe you can try the SRPM from the koji build?

http://koji.fedoraproject.org/koji/taskinfo?taskID=2114193

May I presume you are using make distcheck to generate the tarball for 
your custom RPMs?


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/3] CLD: End-to-end verbosity

2010-04-06 Thread Jeff Garzik

On 03/31/2010 08:43 PM, Pete Zaitcev wrote:

diff --git a/server/server.c b/server/server.c
index 3208e0f..2d68ee6 100644
--- a/server/server.c
+++ b/server/server.c
@@ -55,7 +55,7 @@ static struct argp_option options[] = {
  Store database environment in DIRECTORY.  Default: 
  CLD_DEF_DATADIR },
{ debug, 'D', LEVEL, 0,
- Set debug output to LEVEL (0 = off, 2 = max) },
+ Set debug output to LEVEL (0 = off, 1 = debugging) },
{ stderr, 'E', NULL, 0,
  Switch the log to standard error },
{ foreground, 'F', NULL, 0,
@@ -64,6 +64,8 @@ static struct argp_option options[] = {
  Bind to UDP port PORT.  Default:  CLD_DEF_PORT },
{ pid, 'P', FILE, 0,
  Write daemon process id to FILE.  Default:  CLD_DEF_PIDFN },
+   { verbose, 'v', NULL, 0,
+ Enable the session-level verbosity },
{ strict-free, 1001, NULL, 0,
  For memory-checker runs.  When shutting down server, free local 
  heap, rather than simply exit(2)ing and letting OS clean up. },



As is hinted by the current code's debugging switch being an integer 
'level' value, the server [and client?] has increasing levels of 
verbosity.  The debug levels are


0: key messages affecting server operation, only
1: debugging output enabled, sans per-packet output
2: debugging output enabled, including per-packet output

ie. clearly ordered by increasing value == increased verbosity.

As is clearly illustrated when I cut the patch down to the above 
snippet, the user interface you have created gives the user two knobs 
for log verbosity, and it is not clear to a casual user which knob 
controls which sets of messages.  That makes for a -more- confusing user 
interface, because the user must constantly ask themselves the question 
do I need debug?  or verbose?  I don't know!


Additionally, this interface changes runs counter to other tools, which 
increase verbosity with added -v switches -- analagous to the existing 
integer-based debug level interface.


If it is truly your desire to permit fine-grained selection of certain 
classes of messages, then don't dick around!  Go ahead and create a 
bitmap log mask which permits fine-grained selection of various 
messages, much like netif_msg_* and netif_msg_init() in the kernel's 
include/linux/netdevice.h.


Having two switches, -d and -v, for different, undocumented classes of 
message just increases confusion.  Put yourself in the mind of a user 
trying to figure out which is which.


I readily admit the __internal implementation__ resulting from your 
patches is a useful cleanup, but at a macro level, it merely increases 
logging user interface confusion.


Jeff


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/7] tabled: make two dump displays uniform

2010-04-06 Thread Jeff Garzik

On 04/01/2010 09:51 PM, Pete Zaitcev wrote:

From: Jeff Garzikjgar...@pobox.com
Subject: Re: Tabled issues
Date: Mon, 29 Mar 2010 15:32:33 -0400



I asserted that the standard stats dump facility must dump
all available statistics.  That does not exclude other methods
of stat(us) dumping.  Your patch added new stats to the HTML-pretty
version of output, but failed to add the new stats to the standard
stat dump facility.


Your wish is my command.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  server/replica.c |   28 +
  server/server.c  |   47 ++
  server/status.c  |   22 +--
  server/storage.c |   50 +
  server/tabled.h  |3 ++
  5 files changed, 117 insertions(+), 33 deletions(-)


applied, thanks.  I will endeavor to make the stats dump more like nfs4d 
in the future, FWIW.



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 2/7] tabled: fix the endless recusion when reading long objects

2010-04-06 Thread Jeff Garzik

On 04/01/2010 09:51 PM, Pete Zaitcev wrote:

At certain network and disk speeds, tabled can blow its stack by
filling it with (essentially) endless recursion:

#2  0x0040c077 in cli_write_free (cli=value optimized out, tmp=
 0x7bb910, done=value optimized out) at server.c:397
#3  0x0040ca55 in cli_writable (cli=0x686e90) at server.c:525
#4  0x0040da65 in cli_write_start (cli=0x686e90) at server.c:561
#5  0x00408ad5 in object_get_poke (cli=0x686e90) at object.c:1039
#6  0x0040c077 in cli_write_free (cli=value optimized out, tmp=
 0x7bb8d0, done=value optimized out) at server.c:397
#7  0x0040ca55 in cli_writable (cli=0x686e90) at server.c:525
#8  0x0040da65 in cli_write_start (cli=0x686e90) at server.c:561
#9  0x00408ad5 in object_get_poke (cli=0x686e90) at object.c:1039
#10 0x0040c077 in cli_write_free (cli=value optimized out, tmp=
 0x7bb890, done=value optimized out) at server.c:397

The fix is to deliver callbacks only from the top level.

Callbacks must be delivered every time a send is completed,
which amounts to every call to is_writeable(). Since there
is a large number of callers to it, we found it advantageous
to run callbacks from every source of events. In other words,
every function that is passed to event_set must invoke
cli_write_run_compl. Mind that storage.c contains calls to
event_set.

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  server/object.c |4 +++
  server/server.c |   52 +++---
  server/tabled.h |6 +
  3 files changed, 50 insertions(+), 12 deletions(-)


applied 2-7


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/3] CLD: End-to-end verbosity

2010-04-06 Thread Jeff Garzik

On 04/06/2010 11:32 PM, Pete Zaitcev wrote:

On Tue, 06 Apr 2010 10:40:33 -0400
Jeff Garzikj...@garzik.org  wrote:


The debug levels are

0: key messages affecting server operation, only
1: debugging output enabled, sans per-packet output
2: debugging output enabled, including per-packet output


The previous patch did just that:
Why did you reject it?


That's a damned good question.  I have no idea.  Did I ever reply to 
that patch?  It looks like I fscked up and missed it?


Jeff



--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CLD doesn't build on db-4.3

2010-04-01 Thread Jeff Garzik

On 04/01/2010 07:01 AM, Samba - BoYang wrote:

hi, *
 CLD doesn't build on db-4.3 on suse 11, since db-4.3 uses deprecated
structure members DBC-c_xxx(c_close(), etc) instead of DBC-xxx. :-)

 It won't build on db-4.4, either. probably won't build on db-4.5, as
db-5.0 says DBC-xxx was introduced in db-4.6. :-) Should we disable
support for 4.3 - 4.5 and add 4.9 - 5.0?


I'd answer yes, by a circuitous route:  if I understand things 
correctly, the replicated PAXOS db4 backend that we are heading towards 
(see the 'replica' branch of 
git://git.kernel.org/pub/scm/daemon/cld/cld.git) was buggy in early db4 
releases.


Therefore, it sounds like we could eliminate two issues with a single 
change, by removing support for db 4.3 - 4.5, the DBC issue and the 
PAXOS issue.


I'm fine with adding support for 4.9+ as long as the APIs function in a 
compatible manner.


Want to create the simple patch for this?  :)

Thanks,

Jeff





--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/1] tabled: fix a crash when looking up non-existing NID

2010-03-29 Thread Jeff Garzik

On 03/28/2010 09:57 PM, Pete Zaitcev wrote:

Signed-off-by: Pete Zaitcevzait...@redhat.com

---
  server/storage.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)


applied


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] chunkd: fix duplicate stc_object allocation in stc_parse_key()

2010-03-16 Thread Jeff Garzik

On 03/16/2010 05:59 AM, Akinobu Mita wrote:

At the beginning of stc_parse_key(), st_object is allocated twice for
the same variable.

Signed-off-by: Akinobu Mitaakinobu.m...@gmail.com
---
  lib/chunkdc.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)


good catch, applied


--
To unsubscribe from this list: send the line unsubscribe hail-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >