Re: Announcing CloudBase-1.3.1 release
发自我的 iPhone 在 2009-6-21,9:16,Leo Dagum leo_da...@yahoo.com 写到: Thanks for the kind words. I'm copying this reply to the cloudbase mail list as further discussion is more appropriate there. The way indexing is currently implemented you need to manually update the index as you add more data. If you have not been doing this then that may explain why you are seeing excessive performance degradation with greater data. If that doesn't help, then we'd love to work with you to help resolve the issue. - leo From: Edisonxp ediso...@gmail.com To: core-user@hadoop.apache.org core-user@hadoop.apache.org Sent: Saturday, June 20, 2009 7:21:11 AM Subject: Re: Announcing CloudBase-1.3.1 release I've used cloudbase for several months,since version 1.1 .it's a good thing,your team did a great job. These days, I'm annoying with a problem:with the increasing of data,the my daily queries become more and more slow.i hope that cloudbase should have a 'partition' feature with the creation of table.I used to try 'index', but it seems nothing changed. 发自我的 iPhone 在 2009-6-20,3:19,Leo Dagum leo_da...@yahoo.com 写到: CloudBase is a data warehouse system for Terabyte Petabyte scale analytics. It is built on top of hadoop. It allows you to query flat files using ANSI SQL. We have released 1.3.1 version of CloudBase on sourceforge- https://sourceforge.net/projects/cloudbase Please give it a try and send us your feedback. You can follow CloudBase related discussion in the google mail list: cloudbase-us...@googlegroups.com Release notes - New Features: * CREATE CSV tables - One can create tables on top of data in CSV (Comma Separated Values) format and query them using SQL. Current implementation doesn't accept CSV records which span multiple lines. Data may not be processed correctly if a field contains embedded line- breaks. Please visit http://cloudbase.sourceforge.net/index.html#userDoc for detailed specification of the CSV format. Bug fixes: * Aggregate function 'AVG' returns the same value as 'SUM' function * If a query has multiple aliases, only the last alias works --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CloudBase group. To post to this group, send email to cloudbase-us...@googlegroups.com To unsubscribe from this group, send email to cloudbase-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cloudbase-users?hl=en -~--~~~~--~~--~--~---
Re: Announcing CloudBase-1.3.1 release
a 发自我的 iPhone 在 2009-6-21,9:16,Leo Dagum leo_da...@yahoo.com 写到: Thanks for the kind words. I'm copying this reply to the cloudbase mail list as further discussion is more appropriate there. The way indexing is currently implemented you need to manually update the index as you add more data. If you have not been doing this then that may explain why you are seeing excessive performance degradation with greater data. If that doesn't help, then we'd love to work with you to help resolve the issue. - leo From: Edisonxp ediso...@gmail.com To: core-user@hadoop.apache.org core-user@hadoop.apache.org Sent: Saturday, June 20, 2009 7:21:11 AM Subject: Re: Announcing CloudBase-1.3.1 release I've used cloudbase for several months,since version 1.1 .it's a good thing,your team did a great job. These days, I'm annoying with a problem:with the increasing of data,the my daily queries become more and more slow.i hope that cloudbase should have a 'partition' feature with the creation of table.I used to try 'index', but it seems nothing changed. 发自我的 iPhone 在 2009-6-20,3:19,Leo Dagum leo_da...@yahoo.com 写到: CloudBase is a data warehouse system for Terabyte Petabyte scale analytics. It is built on top of hadoop. It allows you to query flat files using ANSI SQL. We have released 1.3.1 version of CloudBase on sourceforge- https://sourceforge.net/projects/cloudbase Please give it a try and send us your feedback. You can follow CloudBase related discussion in the google mail list: cloudbase-us...@googlegroups.com Release notes - New Features: * CREATE CSV tables - One can create tables on top of data in CSV (Comma Separated Values) format and query them using SQL. Current implementation doesn't accept CSV records which span multiple lines. Data may not be processed correctly if a field contains embedded line- breaks. Please visit http://cloudbase.sourceforge.net/index.html#userDoc for detailed specification of the CSV format. Bug fixes: * Aggregate function 'AVG' returns the same value as 'SUM' function * If a query has multiple aliases, only the last alias works --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CloudBase group. To post to this group, send email to cloudbase-us...@googlegroups.com To unsubscribe from this group, send email to cloudbase-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cloudbase-users?hl=en -~--~~~~--~~--~--~---
Re: Too many open files error, which gets resolved after some time
Hi. After tracing some more with the lsof utility, and I managed to stop the growth on the DataNode process, but still have issues with my DFS client. It seems that my DFS client opens hundreds of pipes and eventpolls. Here is a small part of the lsof output: java10508 root 387w FIFO0,6 6142565 pipe java10508 root 388r FIFO0,6 6142565 pipe java10508 root 389u 0,100 6142566 eventpoll java10508 root 390u FIFO0,6 6135311 pipe java10508 root 391r FIFO0,6 6135311 pipe java10508 root 392u 0,100 6135312 eventpoll java10508 root 393r FIFO0,6 6148234 pipe java10508 root 394w FIFO0,6 6142570 pipe java10508 root 395r FIFO0,6 6135857 pipe java10508 root 396r FIFO0,6 6142570 pipe java10508 root 397r 0,100 6142571 eventpoll java10508 root 398u FIFO0,6 6135319 pipe java10508 root 399w FIFO0,6 6135319 pipe I'm using FSDataInputStream and FSDataOutputStream, so this might be related to pipes? So, my questions are: 1) What happens these pipes/epolls to appear? 2) More important, how I can prevent their accumation and growth? Thanks in advance! 2009/6/21 Stas Oskin stas.os...@gmail.com Hi. I have HDFS client and HDFS datanode running on same machine. When I'm trying to access a dozen of files at once from the client, several times in a row, I'm starting to receive the following errors on client, and HDFS browse function. HDFS Client: Could not get block locations. Aborting... HDFS browse: Too many open files I can increase the maximum number of files that can opened, as I have it set to the default 1024, but would like to first solve the problem, as larger value just means it would run out of files again later on. So my questions are: 1) Does the HDFS datanode keeps any files opened, even after the HDFS client have already closed them? 2) Is it possible to find out, who keeps the opened files - datanode or client (so I could pin-point the source of the problem). Thanks in advance!
Re: Too many open files error, which gets resolved after some time
HDFS/DFS client uses quite a few file descriptors for each open file. Many application developers (but not the hadoop core) rely on the JVM finalizer methods to close open files. This combination, expecially when many HDFS files are open can result in very large demands for file descriptors for Hadoop clients. We as a general rule never run a cluster with nofile less that 64k, and for larger clusters with demanding applications have had it set 10x higher. I also believe there was a set of JVM versions that leaked file descriptors used for NIO in the HDFS core. I do not recall the exact details. On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. After tracing some more with the lsof utility, and I managed to stop the growth on the DataNode process, but still have issues with my DFS client. It seems that my DFS client opens hundreds of pipes and eventpolls. Here is a small part of the lsof output: java10508 root 387w FIFO0,6 6142565 pipe java10508 root 388r FIFO0,6 6142565 pipe java10508 root 389u 0,100 6142566 eventpoll java10508 root 390u FIFO0,6 6135311 pipe java10508 root 391r FIFO0,6 6135311 pipe java10508 root 392u 0,100 6135312 eventpoll java10508 root 393r FIFO0,6 6148234 pipe java10508 root 394w FIFO0,6 6142570 pipe java10508 root 395r FIFO0,6 6135857 pipe java10508 root 396r FIFO0,6 6142570 pipe java10508 root 397r 0,100 6142571 eventpoll java10508 root 398u FIFO0,6 6135319 pipe java10508 root 399w FIFO0,6 6135319 pipe I'm using FSDataInputStream and FSDataOutputStream, so this might be related to pipes? So, my questions are: 1) What happens these pipes/epolls to appear? 2) More important, how I can prevent their accumation and growth? Thanks in advance! 2009/6/21 Stas Oskin stas.os...@gmail.com Hi. I have HDFS client and HDFS datanode running on same machine. When I'm trying to access a dozen of files at once from the client, several times in a row, I'm starting to receive the following errors on client, and HDFS browse function. HDFS Client: Could not get block locations. Aborting... HDFS browse: Too many open files I can increase the maximum number of files that can opened, as I have it set to the default 1024, but would like to first solve the problem, as larger value just means it would run out of files again later on. So my questions are: 1) Does the HDFS datanode keeps any files opened, even after the HDFS client have already closed them? 2) Is it possible to find out, who keeps the opened files - datanode or client (so I could pin-point the source of the problem). Thanks in advance! -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
RE: Too many open files error, which gets resolved after some time
IMHO, you should never rely on finalizers to release scarce resources since you don't know when the finalizer will get called, if ever. -brian -Original Message- From: ext jason hadoop [mailto:jason.had...@gmail.com] Sent: Sunday, June 21, 2009 11:19 AM To: core-user@hadoop.apache.org Subject: Re: Too many open files error, which gets resolved after some time HDFS/DFS client uses quite a few file descriptors for each open file. Many application developers (but not the hadoop core) rely on the JVM finalizer methods to close open files. This combination, expecially when many HDFS files are open can result in very large demands for file descriptors for Hadoop clients. We as a general rule never run a cluster with nofile less that 64k, and for larger clusters with demanding applications have had it set 10x higher. I also believe there was a set of JVM versions that leaked file descriptors used for NIO in the HDFS core. I do not recall the exact details. On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. After tracing some more with the lsof utility, and I managed to stop the growth on the DataNode process, but still have issues with my DFS client. It seems that my DFS client opens hundreds of pipes and eventpolls. Here is a small part of the lsof output: java10508 root 387w FIFO0,6 6142565 pipe java10508 root 388r FIFO0,6 6142565 pipe java10508 root 389u 0,100 6142566 eventpoll java10508 root 390u FIFO0,6 6135311 pipe java10508 root 391r FIFO0,6 6135311 pipe java10508 root 392u 0,100 6135312 eventpoll java10508 root 393r FIFO0,6 6148234 pipe java10508 root 394w FIFO0,6 6142570 pipe java10508 root 395r FIFO0,6 6135857 pipe java10508 root 396r FIFO0,6 6142570 pipe java10508 root 397r 0,100 6142571 eventpoll java10508 root 398u FIFO0,6 6135319 pipe java10508 root 399w FIFO0,6 6135319 pipe I'm using FSDataInputStream and FSDataOutputStream, so this might be related to pipes? So, my questions are: 1) What happens these pipes/epolls to appear? 2) More important, how I can prevent their accumation and growth? Thanks in advance! 2009/6/21 Stas Oskin stas.os...@gmail.com Hi. I have HDFS client and HDFS datanode running on same machine. When I'm trying to access a dozen of files at once from the client, several times in a row, I'm starting to receive the following errors on client, and HDFS browse function. HDFS Client: Could not get block locations. Aborting... HDFS browse: Too many open files I can increase the maximum number of files that can opened, as I have it set to the default 1024, but would like to first solve the problem, as larger value just means it would run out of files again later on. So my questions are: 1) Does the HDFS datanode keeps any files opened, even after the HDFS client have already closed them? 2) Is it possible to find out, who keeps the opened files - datanode or client (so I could pin-point the source of the problem). Thanks in advance! -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Too many open files error, which gets resolved after some time
Just to be clear, I second Brian's opinion. Relying on finalizes is a very good way to run out of file descriptors. On Sun, Jun 21, 2009 at 9:32 AM, brian.lev...@nokia.com wrote: IMHO, you should never rely on finalizers to release scarce resources since you don't know when the finalizer will get called, if ever. -brian -Original Message- From: ext jason hadoop [mailto:jason.had...@gmail.com] Sent: Sunday, June 21, 2009 11:19 AM To: core-user@hadoop.apache.org Subject: Re: Too many open files error, which gets resolved after some time HDFS/DFS client uses quite a few file descriptors for each open file. Many application developers (but not the hadoop core) rely on the JVM finalizer methods to close open files. This combination, expecially when many HDFS files are open can result in very large demands for file descriptors for Hadoop clients. We as a general rule never run a cluster with nofile less that 64k, and for larger clusters with demanding applications have had it set 10x higher. I also believe there was a set of JVM versions that leaked file descriptors used for NIO in the HDFS core. I do not recall the exact details. On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. After tracing some more with the lsof utility, and I managed to stop the growth on the DataNode process, but still have issues with my DFS client. It seems that my DFS client opens hundreds of pipes and eventpolls. Here is a small part of the lsof output: java10508 root 387w FIFO0,6 6142565 pipe java10508 root 388r FIFO0,6 6142565 pipe java10508 root 389u 0,100 6142566 eventpoll java10508 root 390u FIFO0,6 6135311 pipe java10508 root 391r FIFO0,6 6135311 pipe java10508 root 392u 0,100 6135312 eventpoll java10508 root 393r FIFO0,6 6148234 pipe java10508 root 394w FIFO0,6 6142570 pipe java10508 root 395r FIFO0,6 6135857 pipe java10508 root 396r FIFO0,6 6142570 pipe java10508 root 397r 0,100 6142571 eventpoll java10508 root 398u FIFO0,6 6135319 pipe java10508 root 399w FIFO0,6 6135319 pipe I'm using FSDataInputStream and FSDataOutputStream, so this might be related to pipes? So, my questions are: 1) What happens these pipes/epolls to appear? 2) More important, how I can prevent their accumation and growth? Thanks in advance! 2009/6/21 Stas Oskin stas.os...@gmail.com Hi. I have HDFS client and HDFS datanode running on same machine. When I'm trying to access a dozen of files at once from the client, several times in a row, I'm starting to receive the following errors on client, and HDFS browse function. HDFS Client: Could not get block locations. Aborting... HDFS browse: Too many open files I can increase the maximum number of files that can opened, as I have it set to the default 1024, but would like to first solve the problem, as larger value just means it would run out of files again later on. So my questions are: 1) Does the HDFS datanode keeps any files opened, even after the HDFS client have already closed them? 2) Is it possible to find out, who keeps the opened files - datanode or client (so I could pin-point the source of the problem). Thanks in advance! -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Too many open files error, which gets resolved after some time
Yes. Otherwise the file descriptors will flow away like water. I also strongly suggest having at least 64k file descriptors as the open file limit. On Sun, Jun 21, 2009 at 12:43 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Thanks for the advice. So you advice explicitly closing each and every file handle that I receive from HDFS? Regards. 2009/6/21 jason hadoop jason.had...@gmail.com Just to be clear, I second Brian's opinion. Relying on finalizes is a very good way to run out of file descriptors. On Sun, Jun 21, 2009 at 9:32 AM, brian.lev...@nokia.com wrote: IMHO, you should never rely on finalizers to release scarce resources since you don't know when the finalizer will get called, if ever. -brian -Original Message- From: ext jason hadoop [mailto:jason.had...@gmail.com] Sent: Sunday, June 21, 2009 11:19 AM To: core-user@hadoop.apache.org Subject: Re: Too many open files error, which gets resolved after some time HDFS/DFS client uses quite a few file descriptors for each open file. Many application developers (but not the hadoop core) rely on the JVM finalizer methods to close open files. This combination, expecially when many HDFS files are open can result in very large demands for file descriptors for Hadoop clients. We as a general rule never run a cluster with nofile less that 64k, and for larger clusters with demanding applications have had it set 10x higher. I also believe there was a set of JVM versions that leaked file descriptors used for NIO in the HDFS core. I do not recall the exact details. On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. After tracing some more with the lsof utility, and I managed to stop the growth on the DataNode process, but still have issues with my DFS client. It seems that my DFS client opens hundreds of pipes and eventpolls. Here is a small part of the lsof output: java10508 root 387w FIFO0,6 6142565 pipe java10508 root 388r FIFO0,6 6142565 pipe java10508 root 389u 0,100 6142566 eventpoll java10508 root 390u FIFO0,6 6135311 pipe java10508 root 391r FIFO0,6 6135311 pipe java10508 root 392u 0,100 6135312 eventpoll java10508 root 393r FIFO0,6 6148234 pipe java10508 root 394w FIFO0,6 6142570 pipe java10508 root 395r FIFO0,6 6135857 pipe java10508 root 396r FIFO0,6 6142570 pipe java10508 root 397r 0,100 6142571 eventpoll java10508 root 398u FIFO0,6 6135319 pipe java10508 root 399w FIFO0,6 6135319 pipe I'm using FSDataInputStream and FSDataOutputStream, so this might be related to pipes? So, my questions are: 1) What happens these pipes/epolls to appear? 2) More important, how I can prevent their accumation and growth? Thanks in advance! 2009/6/21 Stas Oskin stas.os...@gmail.com Hi. I have HDFS client and HDFS datanode running on same machine. When I'm trying to access a dozen of files at once from the client, several times in a row, I'm starting to receive the following errors on client, and HDFS browse function. HDFS Client: Could not get block locations. Aborting... HDFS browse: Too many open files I can increase the maximum number of files that can opened, as I have it set to the default 1024, but would like to first solve the problem, as larger value just means it would run out of files again later on. So my questions are: 1) Does the HDFS datanode keeps any files opened, even after the HDFS client have already closed them? 2) Is it possible to find out, who keeps the opened files - datanode or client (so I could pin-point the source of the problem). Thanks in advance! -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Too many open files error, which gets resolved after some time
Furthermore, if for some reason it is required to dispose of any objects after others are GC'd, weak references and a weak reference queue will perform significantly better in throughput and latency - orders of magnitude better - than finalizers. On 6/21/09 9:32 AM, brian.lev...@nokia.com brian.lev...@nokia.com wrote: IMHO, you should never rely on finalizers to release scarce resources since you don't know when the finalizer will get called, if ever. -brian -Original Message- From: ext jason hadoop [mailto:jason.had...@gmail.com] Sent: Sunday, June 21, 2009 11:19 AM To: core-user@hadoop.apache.org Subject: Re: Too many open files error, which gets resolved after some time HDFS/DFS client uses quite a few file descriptors for each open file. Many application developers (but not the hadoop core) rely on the JVM finalizer methods to close open files. This combination, expecially when many HDFS files are open can result in very large demands for file descriptors for Hadoop clients. We as a general rule never run a cluster with nofile less that 64k, and for larger clusters with demanding applications have had it set 10x higher. I also believe there was a set of JVM versions that leaked file descriptors used for NIO in the HDFS core. I do not recall the exact details. On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. After tracing some more with the lsof utility, and I managed to stop the growth on the DataNode process, but still have issues with my DFS client. It seems that my DFS client opens hundreds of pipes and eventpolls. Here is a small part of the lsof output: java10508 root 387w FIFO0,6 6142565 pipe java10508 root 388r FIFO0,6 6142565 pipe java10508 root 389u 0,100 6142566 eventpoll java10508 root 390u FIFO0,6 6135311 pipe java10508 root 391r FIFO0,6 6135311 pipe java10508 root 392u 0,100 6135312 eventpoll java10508 root 393r FIFO0,6 6148234 pipe java10508 root 394w FIFO0,6 6142570 pipe java10508 root 395r FIFO0,6 6135857 pipe java10508 root 396r FIFO0,6 6142570 pipe java10508 root 397r 0,100 6142571 eventpoll java10508 root 398u FIFO0,6 6135319 pipe java10508 root 399w FIFO0,6 6135319 pipe I'm using FSDataInputStream and FSDataOutputStream, so this might be related to pipes? So, my questions are: 1) What happens these pipes/epolls to appear? 2) More important, how I can prevent their accumation and growth? Thanks in advance! 2009/6/21 Stas Oskin stas.os...@gmail.com Hi. I have HDFS client and HDFS datanode running on same machine. When I'm trying to access a dozen of files at once from the client, several times in a row, I'm starting to receive the following errors on client, and HDFS browse function. HDFS Client: Could not get block locations. Aborting... HDFS browse: Too many open files I can increase the maximum number of files that can opened, as I have it set to the default 1024, but would like to first solve the problem, as larger value just means it would run out of files again later on. So my questions are: 1) Does the HDFS datanode keeps any files opened, even after the HDFS client have already closed them? 2) Is it possible to find out, who keeps the opened files - datanode or client (so I could pin-point the source of the problem). Thanks in advance! -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Announcing CloudBase-1.3.1 release
When you use cloudbase, you can create different table for different daily files. For example, your directory will like this. logs /200905 /20090501.log.gz /20090502.log.gz /200906 /20090601.log.gz /20090602.log.gz You will create 7 tables, four for daily, two for month, one for all. 2009/6/20 Edisonxp ediso...@gmail.com: I've used cloudbase for several months,since version 1.1 .it's a good thing,your team did a great job. These days, I'm annoying with a problem:with the increasing of data,the my daily queries become more and more slow.i hope that cloudbase should have a 'partition' feature with the creation of table.I used to try 'index', but it seems nothing changed. 发自我的 iPhone 在 2009-6-20,3:19,Leo Dagum leo_da...@yahoo.com 写到: CloudBase is a data warehouse system for Terabyte Petabyte scale analytics. It is built on top of hadoop. It allows you to query flat files using ANSI SQL. We have released 1.3.1 version of CloudBase on sourceforge- https://sourceforge.net/projects/cloudbase Please give it a try and send us your feedback. You can follow CloudBase related discussion in the google mail list: cloudbase-us...@googlegroups.com Release notes - New Features: * CREATE CSV tables - One can create tables on top of data in CSV (Comma Separated Values) format and query them using SQL. Current implementation doesn't accept CSV records which span multiple lines. Data may not be processed correctly if a field contains embedded line- breaks. Please visit http://cloudbase.sourceforge.net/index.html#userDoc for detailed specification of the CSV format. Bug fixes: * Aggregate function 'AVG' returns the same value as 'SUM' function * If a query has multiple aliases, only the last alias works --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups CloudBase group. To post to this group, send email to cloudbase-us...@googlegroups.com To unsubscribe from this group, send email to cloudbase-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/cloudbase-users?hl=en -~--~~~~--~~--~--~---
problem about put a lot of files
Hi, Is there any restriction on the amount of putting files? I tried to put/copyFromLocal about 50,573 files to HDFS, but I faced a problem: 09/06/22 11:34:34 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:34 INFO dfs.DFSClient: Abandoning block blk_8245450203753506945_65955 09/06/22 11:34:40 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:40 INFO dfs.DFSClient: Abandoning block blk_-8257846965500649510_65956 09/06/22 11:34:46 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:46 INFO dfs.DFSClient: Abandoning block blk_4751737303082929912_65956 09/06/22 11:34:56 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:56 INFO dfs.DFSClient: Abandoning block blk_5912850890372596972_66040 09/06/22 11:35:02 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010 09/06/22 11:35:02 INFO dfs.DFSClient: Abandoning block blk_6609198685444611538_66040 09/06/22 11:35:08 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010 09/06/22 11:35:08 INFO dfs.DFSClient: Abandoning block blk_6696101244177965180_66040 09/06/22 11:35:17 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:35:17 INFO dfs.DFSClient: Abandoning block blk_-5430033105510098342_66105 09/06/22 11:35:26 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:35:26 INFO dfs.DFSClient: Abandoning block blk_5325140471333041601_66165 09/06/22 11:35:32 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:32 INFO dfs.DFSClient: Abandoning block blk_1121864992752821949_66165 09/06/22 11:35:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:39 INFO dfs.DFSClient: Abandoning block blk_-2096783021040778965_66184 09/06/22 11:35:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:45 INFO dfs.DFSClient: Abandoning block blk_6949821898790162970_66184 09/06/22 11:35:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:51 INFO dfs.DFSClient: Abandoning block blk_4708848202696905125_66184 09/06/22 11:35:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:57 INFO dfs.DFSClient: Abandoning block blk_8031882012801762201_66184 09/06/22 11:36:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2359) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922) 09/06/22 11:36:03 WARN dfs.DFSClient: Error Recovery for block blk_8031882012801762201_66184 bad datanode[2] put: Could not get block locations. Aborting... Exception closing file /osmFiles/a/109103.gpx.txt java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899) === And I checked the log file on one of the datanode: === 2009-06-22 11:34:47,888 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 2 for block blk_1759242372147720864_66183 terminating 2009-06-22 11:34:47,926 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_-2096783021040778965_66184 src: /140.96.89.224:53984 dest: / 140.96.89.224:51010 2009-06-22 11:34:47,926 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_-2096783021040778965_66184 received exception
problem about put a lot of files
Hi, Is there any restriction on the amount of putting files? I tried to put/copyFromLocal about 50,573 files to HDFS, but I faced a problem: 09/06/22 11:34:34 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:34 INFO dfs.DFSClient: Abandoning block blk_8245450203753506945_65955 09/06/22 11:34:40 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:40 INFO dfs.DFSClient: Abandoning block blk_-8257846965500649510_65956 09/06/22 11:34:46 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:46 INFO dfs.DFSClient: Abandoning block blk_4751737303082929912_65956 09/06/22 11:34:56 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:56 INFO dfs.DFSClient: Abandoning block blk_5912850890372596972_66040 09/06/22 11:35:02 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010 09/06/22 11:35:02 INFO dfs.DFSClient: Abandoning block blk_6609198685444611538_66040 09/06/22 11:35:08 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010 09/06/22 11:35:08 INFO dfs.DFSClient: Abandoning block blk_6696101244177965180_66040 09/06/22 11:35:17 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:35:17 INFO dfs.DFSClient: Abandoning block blk_-5430033105510098342_66105 09/06/22 11:35:26 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:35:26 INFO dfs.DFSClient: Abandoning block blk_5325140471333041601_66165 09/06/22 11:35:32 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:32 INFO dfs.DFSClient: Abandoning block blk_1121864992752821949_66165 09/06/22 11:35:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:39 INFO dfs.DFSClient: Abandoning block blk_-2096783021040778965_66184 09/06/22 11:35:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:45 INFO dfs.DFSClient: Abandoning block blk_6949821898790162970_66184 09/06/22 11:35:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:51 INFO dfs.DFSClient: Abandoning block blk_4708848202696905125_66184 09/06/22 11:35:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:57 INFO dfs.DFSClient: Abandoning block blk_8031882012801762201_66184 09/06/22 11:36:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2359) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922) 09/06/22 11:36:03 WARN dfs.DFSClient: Error Recovery for block blk_8031882012801762201_66184 bad datanode[2] put: Could not get block locations. Aborting... Exception closing file /osmFiles/a/109103.gpx.txt java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899) === And I checked the log file on one of the datanode: === 2009-06-22 11:34:47,888 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 2 for block blk_1759242372147720864_66183 terminating 2009-06-22 11:34:47,926 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_-2096783021040778965_66184 src: /140.96.89.224:53984 dest: / 140.96.89.224:51010 2009-06-22 11:34:47,926 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_-2096783021040778965_66184 received exception
RE: problem about put a lot of files
Hi The max open files have limit in LINUX box. Please using ulimit to view and modify the limit 1.view limit # ulimit -a 2.modify limit For example # ulimit -n 10240 Best wish -Original Message- From: stchu [mailto:stchu.cl...@gmail.com] Sent: Monday, June 22, 2009 12:57 PM To: core-user@hadoop.apache.org Subject: problem about put a lot of files Hi, Is there any restriction on the amount of putting files? I tried to put/copyFromLocal about 50,573 files to HDFS, but I faced a problem: == == 09/06/22 11:34:34 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:34 INFO dfs.DFSClient: Abandoning block blk_8245450203753506945_65955 09/06/22 11:34:40 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:40 INFO dfs.DFSClient: Abandoning block blk_-8257846965500649510_65956 09/06/22 11:34:46 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:46 INFO dfs.DFSClient: Abandoning block blk_4751737303082929912_65956 09/06/22 11:34:56 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:34:56 INFO dfs.DFSClient: Abandoning block blk_5912850890372596972_66040 09/06/22 11:35:02 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010 09/06/22 11:35:02 INFO dfs.DFSClient: Abandoning block blk_6609198685444611538_66040 09/06/22 11:35:08 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.193:51010 09/06/22 11:35:08 INFO dfs.DFSClient: Abandoning block blk_6696101244177965180_66040 09/06/22 11:35:17 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:35:17 INFO dfs.DFSClient: Abandoning block blk_-5430033105510098342_66105 09/06/22 11:35:26 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.57:51010 09/06/22 11:35:26 INFO dfs.DFSClient: Abandoning block blk_5325140471333041601_66165 09/06/22 11:35:32 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:32 INFO dfs.DFSClient: Abandoning block blk_1121864992752821949_66165 09/06/22 11:35:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:39 INFO dfs.DFSClient: Abandoning block blk_-2096783021040778965_66184 09/06/22 11:35:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:45 INFO dfs.DFSClient: Abandoning block blk_6949821898790162970_66184 09/06/22 11:35:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:51 INFO dfs.DFSClient: Abandoning block blk_4708848202696905125_66184 09/06/22 11:35:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 140.96.89.205:51010 09/06/22 11:35:57 INFO dfs.DFSClient: Abandoning block blk_8031882012801762201_66184 09/06/22 11:36:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream( DFSClient.java:2359) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient. java:1745) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl ient.java:1922) 09/06/22 11:36:03 WARN dfs.DFSClient: Error Recovery for block blk_8031882012801762201_66184 bad datanode[2] put: Could not get block locations. Aborting... Exception closing file /osmFiles/a/109103.gpx.txt java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(D FSClient.java:2153) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient. java:1745) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl ient.java:1899) == = And I checked the log file on one of the datanode: